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Impact of Microwave Depolarization During 
Multipath Fading on Digital Radio Performance 


By S. H. LIN 
(Manuscript received October 20, 1976) 


Experimental data describing the statistics of microwave depolar- 
ization during multipath fading have been obtained from a propagation 
experiment conducted near Atlanta, Georgia. The experiment included 
6- and 11-GHz reception on a 26.4-mile path, and 11-GHz reception 
ona 15.9-mile path. A theoretical model, suggested by T. O. Mottl, in- 
dicates that the interference occasioned by depolarization for a given 
copolarized signal level is Rice-Nakagami distributed. The theoretically 
calculated distribution agrees well with the data. The cross-polarization 
interference consists of a signal-level-dependent component as well 
as a residual that is independent of the in-line signal level. The residual 
is Rayleigh distributed with an rms value about 40 dB below the non- 
faded in-line signal level, and limits the multipath fade margin of a 
cochannel dual-polarized digital radio to approximately 30 dB. Cal- 
culated multipath outage probabilities for cochannel, dual-polarized, 
11-GHz, quaternary-coherent-phase-shift-keyed digital radios with 
and without space-diversity protection are presented. 


Il. INTRODUCTION 


Maintaining adequate cross-polarization discrimination (XPD) is 
important to both analog and digital radio transmission systems. For 
analog radio and single polarization digital radio, adequate XPD allows 
reduction of frequency separation between cross-polarized, adjacent 
channels to increase the transmission capacity. This is known as inter- 
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Fig. 1—Path layout, frequencies, and antennas for microwave propagation experiments 
near Atlanta, Georgia. 


stitial operation in channel allocation of analog radio. Some digital radio 
may also rely on XPD to achieve high transmission capacity by a two- 
channel-per-frequency allocation in which both orthogonal linear po- 
larizations in the same frequency band are employed as two independent 
transmission paths. 

References 1 to 5 indicate that XPD can degrade significantly during 
multipath fading. Therefore, statistics of XPD during multipath fading 
are needed to assess the performance and reliability of both analog and 
digital radio. Section II of this paper describes a propagation experiment, 
the measured data, and a theoretical model for microwave depolarization 
during multipath fading. Section III applies these results to calculate 
the multipath-caused outages of dual-polarization 11-GHz quater- 
nary-coherent-phase-shift-keyed (QCPSK) digital radio. A companion 
paper® by T. O. Mottl gives greater details on outage probabilities of 
QCPSK digital radio. 


ll. MICROWAVE DEPOLARIZATION DURING MULTIPATH FADING 


2.1 Introduction 

Section 2.2 describes microwave propagation experiments; Section 
2.3 presents multipath fading and associated depolarization statistics 
including cumulative amplitude distributions, number of fades, and 
average fade durations. Section 2.4 discusses a theoretical model, sug- 
gested by T. O. Mottl,® to describe the behavior of XPD during multipath 
fading. 
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Table | — Path parameters of microwave propagation 
experiments near Atlanta, Georgia 


Path (vzp) rms Transmit- 
Length when ter 
—?:—C*‘Frreq. «=Vy,p =0dB K  -e@ms Polariza- 

Path Miles Km (GHz) (dB) (dB) (dB) tion r 
Atlanta-Palmetto 26.4 42.5 6.031 —25.9 —25.9 —47.6 Vertical 0.13 
Atlanta-Palmetto 26.4 42.5 11.605 —-19.5 —19.6 —36.3 Vertical 0.4 
Villa Rica- 15.9 25.6 11.465 —42.7 —46.2 —45.3 Vertical 0.015 


Palmetto 


r = Multipath fade occurrence factor defined in eqs. (32) and (33). 
Uxp = |kuy + ce/9| 


2.2 Microwave propagation experiments 


Figure 1 displays the path layout for the Palmetto propagation ex- 
periment. Two vertically polarized signals with frequencies of 6.031 and 
11.605 GHz are transmitted over the 26.4-mile path from Atlanta to 
Palmetto, and one vertically polarized signal with a frequency of 11.465 
GHz is transmitted over the 15.9-mile path from Villa Rica to Palmetto. 
At Palmetto, the common receiving site, the levels of both vertically and 
horizontally polarized received signals are recorded. 

Both the transmitter and the receiver on the Atlanta-Palmetto path 
employ standard Bell System horn reflector antennas, waveguides, and 
channel separation networks.’8 Circular waveguide (WC281) simulta- 
neously supports both polarizations of the 6- and 11-GHz signals. On the 
Villa Rica-Palmetto path, a standard Bell System horn reflector antenna, 
waveguide, and network are used to transmit the 11-GHz signal. The 
11-GHz receiver at Palmetto utilizes a 6-foot dish antenna with two el- 
liptical waveguides to separately carry two orthogonally polarized re- 
ceived signals. 

The measured cross-polarization discrimination (XPD) obtaining on 
these paths during nonfading periods is listed in Table I. The best per- 
formance is 42.7 dB on the 11-GHz Villa Rica-Palmetto path; the worst, 
19.5 dB on the 11-GHz Atlanta path. Poor XPD on the 11-GHz At- 
lanta-Palmetto path is believed due to the 4- and 6-GHz channel-sepa- 
ration networks at both transmitting and receiving ends, as well as the 
fact that this link requires quite long waveguides (see Table II).* The 
WC281 waveguide is an overmoded guide at the 11-GHz frequency, 
which supports 21 higher-order modes in addition to the desired fun- 
damental.? Slight imperfections on this long (see Table I) waveguide run 
can cause mode coupling with resultant depolarization.!° The imper- 


* The high fill of radio traffic on this link limits opportunity to study the hardware 
impact on 11-GHz XPD. 
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Table Il — Waveguide types and lengths 


Waveguide Waveguide 
Station Length (ft) Type 
A. Atlanta-Palmetto Path 
Atlanta 75 EW-107 
15 WR-90 
20 WC-281 
Palmetto 54 EW-107 
5 WR-90 Flex-guide 
300 WC-281 
B. Villa Rica-Palmetto Path 
Villa Rica 85 Elliptical 
21 WR-90 
170 WC-281 
Palmetto 80 Elliptical 


fections in antennas, antenna misalignment, and channel-separation 
networks also contribute to depolarization. 

In the following, the in-line signal refers to the received (vertically 
polarized) signal which is “in-line” with the transmitted signal; the 
cross-polarization interference refers to the received horizontally po- 
larized signal which is orthogonal to it. 


2.3 Experimental Data 


The experimental data obtained during the 6.5-month period from 
August 15, 1974 to February 28, 1975 have been processed. 


2.3.1 Statistics of in-line signals 


The measured statistics of multipath fading of the in-line signals are 
shown in Figs. 2, 3, and 4 for the cumulative amplitude distribution, the 
number of fades, and the average fade durations, respectively, as func- 
tions of fade depth. 

In the deep-fade region (=20 dB), the slopes of the distributions in 
Figs. 2, 3, and 4 are consistent with the theoretical distribution for deep 
fades.!!-!4 The cumulative amplitude distribution has an inverse slope 
of 10 dB per decade of probability, the number of fades has an inverse 
slope of 20 dB per decade of number of fades, and the average fade du- 
ration has an inverse slope of 20 dB per decade of duration. 


2.3.2 Statistics of cross-polarization interference 


Let vy_(t) and vxp)(t) be the time varying amplitudes of the in-line 
signal voltage and the depolarized (interference) voltage, respectively, 
both normalized to the nonfaded in-line signal voltage. The cross-po- 
larization discrimination (XPD) can be written as 


XPD = 20 logio(vit/vxp), dB (1) 
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Fig. 2—Measured 6.5-month data (August 15, 1974 to February 28, 1975) on the cu- 
mulative amplitude distributions of in-line signals during multipath fading. 


= Vit — Vip, (2) 
where 
Vit = 20 logiov1,, dB (3) 
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Fig. 3—Measured 6.5-month data (August 15, 1974 to February 28, 1975) on number 
of fades of in-line signals during multipath fading. 


is the time-varying, in-line signal level in dB with respect to its nonfaded 
level, and 


Vip = 20 logiovxp dB (4) 


is the time-varying, depolarized component of the signal measured (in 
dB) with respect to the nonfaded level of in-line signal. 

Figure 5 shows fluctuations of Vy, and V,, measured on the 11-GHz, 
Atlanta-Palmetto path during a typical nonfading hour. Both Vy, and 
Vp scintillate but approximately 20 dB of XPD is maintained. Figure 
6 shows the behavior of Vj, and V,, during a multipath fading hour. In 
the 40-minute period from 8:00 a.m. to 8:40 a.m., the variations of Viz, 
and V,,, appear to be relatively well correlated. However, at 8:44 a.m., 
Vi suffers a 35-dB fade, whereas V,, undergoes only a 15-dB fade (from 
—20 dB to —35 dB), thereby degrading the XPD to 0 dB. In other words, 
at this moment, the signal received at the cross-polarized feed (inter- 
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Fig. 4—Measured 6.5-month data (August 15, 1974 to February 28, 1975) on average 
fade durations of in-line signals during multipath fading. 


ference to a signal normally transmitted with horizontal polarization 
that has faded with the same amount as Vz) is as strong as the desired 
horizontally polarized in-line signal and will prevent useful transmission 
of data over the channel; hence, an outage is caused to a 2-channel- 
per-frequency-assignment digital radio. Figure 7 shows another example 
of depolarizing fading on the 11-GHz, Villa Rica-Palmetto path. 

Although rain also causes depolarization, the measured XPD is gen- 
erally better than 10 dB, even during rain fades in excess of 40 dB. De- 
polarization during multipath is therefore considerably more serious 
than rain-caused depolarization in Western U.S.A., where rain-caused 
outage is not the dominant controlling factor on radio-system reliabili- 
ty. 
Figures 8 through 10 display the rms value of v,, conditioned to the 
fade depth of the in-line signal vyz.. In the shallow fade region (i.e., Viz, 
= —10 dB), (Vxp)rms decreases almost linearly (dB by dB) with Vy; 
whereas in the deep-fade region (i.e., Vi, < —20 dB), (Vxp)rms ap- 
proaches a residual level, becoming independent of Vy, as Vy, decreases. 
The residual depolarized component in these three sets are between 37 
and 50 dB below the nonfaded in-line signal level. 

Figures 11 through 13 show the probability distributions of V,.p, 
conditioned to a given fade depth of in-line signal, plotted on Rayleigh 
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Fig. 5—Scintillations of in-line signal and cross-polarization interference on the 11-GHz 
26.4-mile path (Atlanta-Palmetto) during a nonfading hour—August 28, 1974; 12:00 A.M. 
to 1:00 P.M. 


probability paper. For deep fades of in-line signal (i.e., Vi, = —30 dB), 
the probability distributions of V,, are approximately Rayleigh. This 
Rayleigh distribution of V,,, conditioned to deep fades of in-line signal, 
was reported earlier in Ref. 1. 

The solid lines in Figs. 8 through 13 are calculated results from a 
theoretical model discussed in the next section. 
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Fig. 6—Fading behavior of in-line signal and cross-polarization interference on the 
11-GHz 26.4-mile path (Atlanta-Palmetto) during a multipath fading hour—August 23, 
1974; 8:00 A.M. to 9:00 A.M. 
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Fig. 7—Fading behavior of in-line signal and cross-polarization interference on the 
11-GHz 15.9-mile path (Villa Rica-Palmetto) during a multipath fading hour—November 
2, 1974; 9:00 A.M. to 10:00 A.M. 


2.4 Theoretical model 


Based on early observations similar to those displayed in Figs. 6 
through 10, Mottl® suggests decomposition of the cross-polarization 
interference v,, into two components: 


Urp = |kur, + ee/®|, (5) 


where & is a proportionality constant, and ¢ is the relative phase between 
the proportionate component and the residual component. 

For shallow fades of the in-line signal, v,, is dominated by kuyy, and 
decreases linearly with vy,. During deep fades of the in-line signal, vp 
is dominated by e, which is independent of vy. During deep fades of 
in-line signal, the probability distribution of v,, is essentially that of «. 
The data in Figs. 11 through 13 indicate the conditional distribution of 
Uxp. It is seen that ¢ is approximately Rayleigh, a single parameter dis- 
tribution uniquely determined by its rms value. The rms values, &ms, 
are obtained from the measured distribution of v,, during deep fades 
of vy, and they range from —36 to —48 dB, as listed in Table I. 

For a given in-line signal level, eq. (5) indicates that v,, consists of a 
constant vector [i.e., Ruy] plus a Rayleigh vector. Such interpretation 
immediately implies that, for a given in-line signal level, the cross-po- 
larization interference, vp, is Rice-Nakagami distributed.!°-!” In other 
words, the conditional distribution of v,, can be written as 


P(Vep 2 avi) = f p(Vep|v1,) dvxp, (6) 
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Fig. 8—Dependence of rms value of cross-polarization interference on in-line signal level 
measured on the 6-GHz 26.4-mile path (Atlanta-Palmetto); measured 6.5-month data 
(August 15, 1974 to February 28, 1975). 


where 
20x 2 + k2v? Qu, ok 
P(vxp|UiL) = | Se exp ~ =) | fo ( Uv p nt) | 7) 
€rms €rms Cris 


and [o(~) denotes the modified Bessel function of zeroth order. Based 
on the well-known properties of this Rice-Nakagami distribution,!°-!" 
it is easily shown that 


ltixp (UL) |rms = [k 20FL + nal a2 (8) 
from which, 
1 
= ra {[vip (vt) ]2ms = er (9) 
IL 


for any Uy, where Ux, (Ui) denotes v,, conditioned to a given vj. This 
equation shows that the proportionality constant k in eq. (5) can be 
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Fig. 9—Dependence of rms value of cross-polarization interference on in-line signal 
measured on the 11-GHz 26.4-mile path (Atlanta-Palmetto); measured 6.5-month data 
(August 15, 1974 to February 28, 1975). 


calculated from measured values of [vxp (UL) |rms and @ms- For example, 
the data in Fig. 9 for the 11-GHz Atlanta-Palmetto path indicate 


[vxp (Vit = 0 dB)]rms = —19.5 dB (10) 
and 
&ms = [Usp (Ui, S 30 dB)];ms = —36.3 dB. (11) 
Substituting (10) and (11) into (9) yields 
k = 0.105 (12) 
from which 
K = 20 logio k = —19.6 dB. (13) 


Table I lists estimated values of K for all three sets of data. 
The Rice-Nakagami distribution (7) for v,, is completely determined 
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Fig. 10—Dependence of rms value of cross-polarization interference on in-line signal 
level measured on the 11-GHz 15.9-mile path (Villa Rica-Palmetto); measured 6.5-month 
data (August 15, 1974 to February 28, 1975). 


by two parameters: ¢m; and k. Since these two parameters are obtained 
from the measured data, the Rice-Nakagami distribution of vz, for any 
vy, can be calculated. Figures 11 through 13 show that the theoretical 
distributions (solid lines) calculated from the two parameters (€;ms and 
k) agree very well with the measured data. 

Similarly, eq. (8) indicates that [v,p(v1)]rms for any vir, is also com- 
pletely determined by the same parameters: €; and k. Figures 8 through 
10 show that the theoretical dependence of [v,p(v1L)]rms calculated by 
eq. (8) also agrees closely with the data. 

At the present time, the physical interpretation of the theoretical 
model (5) is speculative. The proportionality constant k may be related 
to antenna alignment, imperfections in the antennas, the waveguides, 
or the channel-separation networks (designed for transmission of the 
fundamental mode). In other words, the proportional component, kvyz, 
may be controllable by reducing these imperfections. On the other hand, 
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Fig. 11—Rice-Nakagami distribution (with Rayleigh probability coordinates) of 
cross-polarization interference, V,p, conditioned to a given in-line signal level, Vi_, and 
measured on the 6-GHz 26.4-mile path (Atlanta-Palmetto). The circles are measured 
6.5-month data (August 15, 1974 to February 28, 1975). 


the Rayleigh distribution of the residual cross-polarized component, e, 
suggests that « may represent the sum of many small depolarized com- 
ponents due to, say, foreground scattering,* the antenna cross-polar- 
ization response to off-axis incoming rays, or the excitation of higher- 
order modes in the waveguides by the off-axis incoming rays.t 

For successful isolation of the two information channels of a dual- 
polarized digital radio system, it is reasonable to expect that 


K = 20 logigk < —20 dB. (14) 


Since the correlated component, kvy,,, of cross-polarization interference 
always fades simultaneously with the desired signal and maintains a 
carrier-to-interference ratio (CIR) of 20 dB or better, the degradation 
of radio fade margin due to the correlated components, kuj,, is quite 


* In the multipath propagation condition, the relative amplitudes and phases of the 
foreground scattered components are quite different from those of direct paths. Therefore, 
the on of foreground scattered components is decorrelated from the sum of the direct 
paths. 

1t Waveguide imperfections may couple depolarized components through higher-order 
modes. Since these are dispersed!®9 in the circular waveguide and antenna responses to 
off-axis incoming rays are mode-dependent, their sum is decorrelated from vy. 
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Fig. 12—Rice-Nakagami distribution (with Rayleigh probability coordinates) of 
cross-polarization interference, V,, conditioned to a given in-line signal level, Vj,, and 
measured on the 11-GHz 26.4-mile path (Atlanta-Palmetto). The circles are measured 
6.5-month data (August 15, 1974 to February 28, 1975). 


small, say 2 dB or less. On the other hand, the residual cross-polarization 
interference ¢, being independent of the desired signal, will limit fade 
margin. For example, for a typical 11-GHz digital radio, the thermal noise 
is about 60 dB below the nonfaded signal, whereas €;m; is only about 40 
dB below the nonfaded signal (see Table I). Therefore, the residual 
cross-polarization interference may reduce the fade margin by as much 
as 20 dB and greatly increase the “multipath caused” outages. 


Ill, MULTIPATH OUTAGES OF DUAL-POLARIZATION, 11-GHz, PHASE- 
SHIFT-KEYED DIGITAL RADIO 


3.1 Introduction 


Section 3.2 presents a model for the performance of quaternary-co- 
herent-phase-shift-keyed (QCPSK) digital radio subjected to interference 
and noise. Section 3.3 outlines the procedure and states the assumptions 
needed for calculation of multipath-caused outage probabilities of 
dual-polarization, 11-GHz, QCPSK digital radio. Section 3.4 summarizes 
the results of outage calculations. 
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Fig. 13—Rice-Nakagami distribution (with Rayleigh probability coordinates) of 
cross-polarization interference, Vp, conditioned to a given in-line signal level, Vi, 
measured on the 11-GHz 15.9-mile path (Villa Rica-Palmetto). The circles are measured 
6.5-month data (August 15, 1974 to February 28, 1975). 


Rain-caused outage probabilities of 11-GHz radio is treated else- 
where.”%?! This paper, therefore, treats only multipath-caused out- 
ages. 


3.2 Digital radio model 


We model the cochannel, dual-polarization, 11-GHz digital radio as 
a QCPSK system corrupted by complex gaussian noise with one major 
interference representing the depolarized component of the cochannel 
cross-polarized signal. 

Both noise and interference cause digital transmission errors. Figure 
14 shows the relationship between the carrier-to-noise ratio (CNR) and 
carrier-to-interference ratio (CIR) for a fixed bit-error-rate (BER). In Fig. 
14 the circles represent the measured performance? of a prototype 
QCPSK digital radio; the solid lines are approximations described by 


CNR cIR+ A CNR, 


10 10 10 
10 + 10 = 10 ; (15) 
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Fig. 14—Relationship between carrier-to-interference ratio and carrier-to- noise ratio 
for a given bit-error-rate (BER) of QCPSK radio. 


or equivalently, 


ss CNRg = CNR 
10 10 
CIR = —10 logig (10 —- 10 =A; (16) 
where 
13.4 dB for BER = 1072 (17) 
CNR, = 
ee dB for BER = 1076 (18) 
and 
yee 3.8 dB for BER = 1078 (19) 
fe dB for BER = 10-6 (20) 


Equation (15) means that for a given BER threshold, the interference 
power reduced by A dB plus noise power is a constant. In other words, 
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the QCPSK radio system is more resistant to interference than to noise 
by A dB. 

The probability of outage of the QCPSK radio for a given BER threshold 
is simply the integral of the joint CNR, CIR probability density function 
over the two-dimensional region lying below the BER threshold curve 
in Fig. 14. This double integration is carried out by eqs. (27) and (31) in 
the next section. 


3.3 Outline of the outage estimation procedure 


(a) The nonfaded carrier-to-noise ratio, CNRnp, for a reference 25- 
mile hop length is assumed to be 67.4 dB.?? 
(b) The dependence of CNRyr on hop length D is 





CNRyF(D) = CNRnF(Do) — 20 log = : (21) 
0 


where 
Do = 25 miles. (22) 


(c) With the desired, in-line signal faded a dB, the effective CNR(D) 
for path length D is: 
CNR(D) = CNRnF(D) — a, (23) 


where 
a = —20 logi0V 1, dB. (24) 


(d) For a given BER (say 10°) at outage threshold, the CNR versus 
CIR curve in Fig. 14 means that the cross-polarization interference, Vz po, 
at outage threshold is a function of in-line signal level, vy_. By combining 
eqs.. (15), (23), and (24), it can be shown that 


CNRg _ CNRNF~ @, 1/2 A 


10 10 20 
Uxpo(UiL) = vy, { 10 — 10 10 . = (25) 





(e) For a given in-line signal level, the probability of outage is 


P(outage| vy) = P(vxp = Uxpo |v) (26) 
= P(Uxp | UL) avzp, (27) 
Uxpo 


where p(vxp| v1) is given by eq. (7). 
(f) The distribution of the in-line signal, vy, without diversity pro- 
tection is assumed to be Rayleigh, i.e., 


P(vy) = 2vyte7*. (28) 
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(g) The distribution of the in-line signal, vy, with space diversity and 
selective switching is?#:24 


8uIL VIL 2pzvi z2 +07, 
= I (Geer!) = Ga ; 
pn) (1 — p?) if Tiles 1— p? ad 1 — p? 


(29) 





where p” is approximately equal to the correlation coefficient between 
the two input signals received from the space-diversity antenna pair.?° 


The dependence of p* on antenna spacing S is obtained empirically 
526.27 


Sf 


2=1-7x105—, 30 
p A (30) 
where 

S = antenna center-to-center separation in feet, 

f = radio frequency in GHz, 


D = path length in miles. 


Strictly speaking, assumptions (f) and (g) are valid only for the deep-fade 
region, but not the shallow-fade region. However, outages occur mostly 
in the deep-fade region (see Figs. 15 and 16), therefore, assumptions (f) 
and (g) are acceptable for outage calculations. 

Notice that the selective-switching scheme always connects the re- 
ceiver to the better one of the two input signals and is an idealized switch. 
In practice, there are many different ways of utilizing the two input- 
diversity signals. For example, threshold-blind switching provides less 
improvement, whereas equal-gain combining or maximum-ratio com- 
bining provides more improvement than the selective switching.!? 
Furthermore, a digital radio may be caused to switch at a given error-rate 
threshold rather than through the signal amplitudes of the two input 
signals. In this paper, we use selective switching for ease of computation 
and thus provide only an estimation of feasible diversity improvement. 
A full-scale investigation of various diversity-protection schemes is be- 
yond the scope of this paper. 

(h) The total outage probability is 


P outage = f, P(outage|vi,)[p(virt) dvr] 


= f, P(Uzp = Vxpo|vi)[p(ur.)dvr] (31) 
and the total two-way outage time is 


T outage = 2rT oP outage minutes/year, (32) 
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BER = 1073, 

20 LOG 19€;ms = —45.3 dB 
20 LOG ok = —46.2 dB 
CNRyF = 71.3 dB 


1073 


NO SPACE DIVERSITY 


10-4 


SPACE DIVERSITY WITH 
50-FT ANTENNA SPACING 


10-5 


OUTAGE PROBABILITY IN PERCENT PER dB OF FADE DEPTH OF IN—LINE SIGNAL 





10 20 30 40 50 
FADE DEPTH OF IN—LINE SIGNAL 


Fig. 15—Dependence of one-way outage probability density function on multipath fade 
depth of in-line signal of 11-GHz QCPSK radio on a 15.9-mile path with and without 
space-diversity protection. 


where T'9 = 525,600 minutes is the total annual time and r is the multi- 
path occurrence factor,!*+:2”7 which depends on many factors such as radio 
frequency, path length, path terrain, and geographic location. Figures 
15 and 16 show the integrand, [P(outage|v1,)p (v11)], as a function of fade 
depth of the in-line signal. Most outage occurs at about 12 dB above the 
rms power level (i.e., 20 logio¢ms) of the residual cross-polarization in- 
terference «. This is because QCPSK radio needs at least 10 dB of CIR at 
BER = 107? and the correlated component, kuy,, of cross-polarization 
interference adds another 2-dB requirement. Therefore, the “effective 
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NO SPACE DIVERSITY 


1073 


1074 


SPACE DIVERSITY WITH 
50-—FT ANTENNA SPACING 


OUTAGE PROBABILITY IN PERCENT PER dB OF FADE DEPTH OF IN—LINE SIGNAL 


BER = 107° 

20 LOG19€;ms = —37 dB 
20 LOGiok = —19.6 dB 
CNRw¢ = 67 dB 


Ses 


10-6 
10 20 30 40 50 
FADE DEPTH OF IN-LINE SIGNAL 


Fig. 16—Dependence of one-way outage probability density function on multipath fade 
depth of in-line signal of 11-GHz QCPSK radio—26.4-mile path with and without space 
diversity protection. Data based on multipath-caused one-way outage. 
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outage threshold” of dual-polarization QCPSK radio is approximately 
12 dB* above ems. For example, the 26.4-mile path has an effective fade 
margin of only 25 dB. 

Notice that the thermal noise is 67 dB below the nonfaded signal level 
whereas this outage threshold for the 26.4-mile path is only 25 dB below 
the nonfaded signal level. Therefore, “multipath caused” outage of 
dual-polarization radio is strongly controlled by ems. 

(i) According to assumptions (f) and (h), the distribution of vy, in the 
deep-fade region is 

T(vy, = L) =~ rToL? for L = 0.1, (33) 


where 7(vj,;, <= L) denotes the accumulated time per year that vy,, fades 
below L. The value of multipath occurrence factor, r, can be determined 
empirically by fitting eq. (33) to the measured data on T(vy, = L). The 
estimated values of r for a 15.9-mile path and a 26.4-mile path near At- 
lanta, Georgia are given in Table I. 


3.4 Results 


To calculate outage probability of a radio path, we need the parame- 
ters: r, k, and €m; on that path. At the present time, these parameters 
are available only for two paths (15.9 and 26.4 miles—see Table I) 
measured near Atlanta, Georgia. This limits our calculations to these 
two particular path lengths only. The linear interpolation between these 
two paths lengths in Figs. 17 through 19 gives a crude estimate of outage 
probabilities for intermediate path lengths. 


3.4.1 Effect on dual-polarization transmission 


Figure 17 displays the calculated outage probabilities without diver- 
sity protection. The upper curve represents cochannel dual-polarization 
transmission, whereas the lower curve represents idealized single-po- 
larization performance without cross-polarization interference and 
without adjacent channel interference. It is seen that the impact of a dual 
channelization is three orders of magnitude increase in multipath outage 
time. This is because the residual cross-polarization interference ¢€ms 
is 30 dB higher than the thermal noise. 

However, the idealized single-polarization performance, represented 
by the lower curve in Figure 17, is academic in that the impacts of ad- 
jacent channel interferences and channel dispersion during multipath 
fading are neglected, resulting in more than 54 dB of multipath fade 
margin. For a practical single-polarization-per-frequency digital system, 


* 16 dB for BER = 1076. 
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BER = 10-2 AT OUTAGE THRESHOLD 
CNRyp (25 MILE) = 67.4 dB 


QCPSK 


TWO CHANNELS PER FREQUENCY 


OUTAGE TIME PER HOP IN MINUTES PER YEAR 


IDEALIZED, ONE CHANNEL PER FREQUENCY 
WITHOUT ADJACENT CHANNEL INTERFERENCES 





5 10 15 20 25 30 
HOP LENGTH IN MILES 


Fig. 17—The impact of cochannel dual-polarization transmission on multipath two-way 
outage probability of 11-GHz QCPSK radio. No space-diversity protection was provid- 
ed. 


666 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


the adjacent channel interferences and channel dispersion will limit the 
multipath fade margin to less than 35 dB and significantly increase the 
multipath outage time with reference to the idealized curve. 


3.4.2 Space-diversity improvement 


Figure 18 displays the reduction of multipath outage time through 
the use of space diversity. With 50-foot antenna spacing, the outage time 
is reduced by one order of magnitude for the 26.4-mile path and two 
orders of magnitude for the 15.9-mile path. The diversity improvement 
factor achieved by the dual-polarization digital radio is generally much 
smaller than we familiarly associate with a single-polarization analog 
radio.”426.27 This is because the dual-polarization digital radio has an 
effective fade margin of only 20 to 30 dB, whereas single-polarization 
analog radio generally provides a fade margin of 35 dB or more. Diversity 
improvement is proportional to fade margin. 


3.4.3 Effect of error-rate requirement 


Figure 19 shows that tightening the BER requirement at outage 
threshold from 107-3 to 10~® increases the outage time by a factor of 6, 
even with diversity protection. This sensitivity is related to the steep 
inverse slope, 5 dB per decade, of fading probability of a dual-diversity 
signal.!2 Tightening the BER from 10° to 10~® is equivalent to 4 dB (ie., 
16 to 12 dB) loss of effective fade margin. 


3.4.4 Effects of €,m, and k of interference 


Figure 20 indicates that the outage probability is extremely sensi- 
tive to the power level, €;m;, of the residual cross-polarization interfer- 
ence. A 10-dB decrease in ¢-ms will reduce outage probability by two or- 
ders of magnitude. Again, this is related to the steep, inverse slope of 5 
dB per decade of fading probability for a dual-diversity signal. 

On the other hand, Figure 21 shows that the outage probability is 
practically independent of the correlated components (Ruy) of cross- 
polarization interference as long as 20 logigk = —30 dB. This is intuitively 
obvious because this component of CIR is constant. As long as CIR 2 30 
dB, the interference has negligible effect on QCPSK radio (see CNR versus 
CIR curve in Fig. 14). In practice, this means it is of questionable benefit 
to improve XPD of a radio link beyond 30 dB, during periods of normal 
propagation. The controlling factor on outage time is the residual 
cross-polarization, ¢, which is typically 10 dB or more below kvy, and is 
almost unobservable during a nonfading period. Therefore, the under- 
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SPACE DIVERSITY WITH 
25-—FT ANTENNA SPACING 


SPACE DIVERSITY WITH 
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Fig. 18—Effect of space-diversity protection on multipath two-way outage probability 
of 11-GHz dual-polarization QCPSK radio. 
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103 


OUTAGE TIME PER HOP IN MINUTES PER YEAR 
3, 





107! 
5 10 15 20 25 30 


HOP LENGTH IN MILES 
Fig. 19—Fffect of bit-error-rate requirement at outage threshold on multipath two-way 


outage probability of 11-GHz dual-polarization @CPSK radio—with space diversity and 
50-foot antenna spacing. 
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—20 —30 —40 —50 
RESIDUAL CROSS—POLARIZATION INTERFERENCE, 20 LOG yg €,m,, 
IN dB WITH RESPECT TO NONFADED IN-LINE SIGNAL LEVEL 


Fig. 20—Sensitivity of multipath two-way outage probability of 11-GHz dual-polar- 
ization QCPSK radio on the rms value, ¢rms, of the residual component of cross-polarization 
interference—with space diversity and 50-foot antenna spacing. 
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20 LOG igk IN dB, CORRELATED COMPONENT 
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_ Fig. 21—Dependence of multipath two-way outage probability of 11-GHz dual-polar- 
ization QCPSK radio on the correlated component, kui, of cross-polarization interfer- 
ence—with space diversity, 50-foot antenna spacing, and 26.4-mile path. 


standing and identification of major contributors of the residual cross- 
polarization interference ¢ are important subjects for future study. 

3.4.5 Effect of thermal noise 

The effects of varying CNRyr on outage probability are shown in 
Fig. 22. Since CNR, = 13.4 dB, at BER = 107°, assumption (a) implies 
a 54-dB fade margin* in the absence of the interference.”? Figure 22 
shows it is also not of benefit to suppress the thermal noise level below 
the level, é:ms — A, of residual cross-polarization interference as far as 
multipath outage? is concerned. As long as the noise level is below érms 
— A, the multipath outage is controlled by ¢m, and is independent of 
noise level. . 


3.5 Some qualifications 
3.5.1 Effect of channel dispersion during multipath fading 


A quaternary PSK signal is equivalent to two streams of binary PCM 
signals, each phase modulating a carrier of the same frequency, but with 


* The fade margin is reduced to 48 dB during rain because of 4-dB wet radome atten- 
uation and 2-dB degradation by rain-induced depolarization. 
* On the other hand, reducing the noise level will reduce rain-caused outage time. 
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Fig. 22—Effect_of reducing thermal noise level on the multipath two-way outage 
probability of 11-GHz dual-polarization QCPSK radio—with space diversity, 50-foot an- 
tenna spacing, and 26-mile path length. 


90-degree phase shift.22 Therefore, cochannel dual-polarization QCPSK 
radio requires the following two orthogonalities for successful trans- 
mission: 


(i) Polarization orthogonality between the cochannel pair. 
(it) Phase orthogonality between the two streams of binary PSK sig- 
nals. 


Multipath fading, with channel dispersion, will degrade both polarization 
and phase orthogonalities. At the present time, our understanding of 
degradation of phase orthogonality during multipath fading is insuffi- 
cient for outage calculations. Therefore, the possible outages due to 
crosstalk between the two orthogonally phased carriers are unaccounted 
for in this paper. 


3.5.2 Limitation of available path parameters 


All the calculations in this report are based on measured path pa- 
rameters: r, k, and &ms, pertinent to two paths near Atlanta, Georgia. 
The variations of k and em; parameters with time base, path length, and 
geographic location are not understood. 
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3.5.3 Relationship between CNR and CIR 


The CNR versus CIR relationship for QCPSK radio (Fig. 14) is based 
on presently available hardware technology. This behavior is several dB 
poorer than the idealized relationship in Refs. 28 and 29. Future ad- 
vances may improve this behavior and reduce the outage time. 


IV. CONCLUSION 


A microwave propagation experiment near Atlanta, Georgia has 
provided statistics describing depolarization during multipath fading 
at 6 and 11 GHz on two paths (15.9 and 26.4 miles). Data gathered over 
a 6.5-month period have been processed and presented. These data are 
supportive of a model in which the depolarized signal (interference) 
consists of a component correlated with the in-line signal, and a residual 
component independent of the in-line signal. The residual component 
is Rayleigh distributed with a typical rms value about 40 dB below the 
nonfaded in-line signal. The cross-polarization interference, conditioned 
to a given in-line signal level, is Rice- Nakagami distributed. The calcu- 
lated Rice-Nakagami distribution fit these 6.5-month data very close- 
ly. 

The experimental data and theoretical model are applied to estimate 
the multipath outage probability of cochannel dual-polarization 11-GHz 
QCPSK digital radio affected by thermal noise and cross-polarization 
interference during multipath fading. Detailed results are given in 
Section 3.4. 
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A statistical model for outages that occur in a dual-polarized-fre- 
quency radio channel during periods of multipath fading is proposed, 
and an analytical expression for outage time obtained. The model, 
which results in Rice-Nakagami statistics for the cochannel interfer- 
ence signal, describes channel outage time as a function of several en- 
vironmental and radio-system parameters. The formulation obtained 
allows for efficient parameteric studies to evaluate the importance of 
these parameters to channel outage time and to examine parameter 
sensitivity questions. Results of practical significance relative to 
hardware XPD requirements, maximum hop length, system gain, de- 
pendence on geographic environment, and digital terminal perfor- 
mance characteristics are obtained for present 11-GHz QCPSK (qua- 
ternary-coherent-phase-shift-keyed) digital radio systems. Estimates 
of dual-polarized-frequency channel outage time are obtained for a 
variety of representative system parameter values and compared with 
expected outage times for a conventional channel. Particular attention 
is given to the mechanism of channel outages during multipath fading, 
and several potential means for control and/or reduction of channel 
outage time are discussed. 


I. INTRODUCTION 
1.1 DPF radio systems 


Federal regulatory requirements for the channel capacity of an 11-GHz 
digital radio system are achieved in some short-haul radio systems by 
a diplex approach that uses each available radio-channel frequency band 
twice. Such designs, denoted here as dual-polarized-frequency (DPF) 
radio systems, are based on the high isolation theoretically obtainable 
between the two linear polarizations (vertical and horizontal) available 
for propagation at each radio-wave frequency. 
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Practical implementation of a DPF radio system relies on the attain- 
ability of high polarization isolation in the radio system, in the antenna 
system, and over the propagation path, and on the maintainability of 
these conditions. At any given frequency, degradation in the isolation 
between the orthogonally polarized signals can result in cochannel in- 
terference, with attendant degradation in the channel bit-error rate 
(BER). 

This paper presents a theoretical modeling study of DPF radio channel 
reliability during periods of multipath fading. Using channel bit-error 
rate as a criterion, the effects of the polarization-isolation properties of 
system hardware and the propagation path on channel-outage time are 
evaluated. The sensitivity of bit-error rate to system hardware and 
propagation-path characteristics is of particular interest because of their 
direct impact on feasibility and cost of DPF radio systems. 

A complete listing of all symbols and notation used in this paper is 
provided in Appendix A. 


1.2 Linear polarization isolation 


During periods of multipath fading activity, measurements on 11-GHz 
radio waves propagated along line-of-sight paths display statistical 
fluctuations in the isolation between orthogonal linearly polarized waves 
at the same frequency.!“ A convenient measure of the isolation between 
two cross-polarized signals—i.e., signals transmitted via such orthogo- 
nally polarized waves—is the cross-polarization-discrimination ratio 
or XPD. The XPD is defined as the ratio (usually expressed in dB) of the 
energy received on a reference polarization that results from a signal 
transmitted with the same polarization (the copolarized, desired signal) 
to the energy received on this same polarization but transmitted with 
the orthogonal polarization (the cross-polarized or interference sig- 
nal). 


1.3 Multipath outage estimates 


Estimates of single-hop (point-to-point) outage probabilities for 
11-GHz QCPSK digital radio systems were first obtained by Lin in 1973 
based in part on a one-parameter model for the RMS-XPD ratio during 
multipath fading. In this model, at any fixed fade depth in the deep-fade 
region, the cross-polarized voltage envelope varies as a Rayleigh dis- 
tributed random variable, and the RMS-XPD behaves as a linear function 
of the fade depth in dB, as shown by curve “L” in Fig. 1. 

Some early experimental data* showed a nonlinear dependence of 
RMS-XPD vs fade depth, with linear proportionality between RMS-XPD 
and fade depth only during deep multipath fading, e.g., fades greater 
than 10 to 15 dB. Curve NL in Fig. 1 is a plot of one sample data set that 
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RMS VALUE OF XPD IN DECIBELS 





FADE DEPTHIN DECIBELS 


Fig. 1—RMS value of cross-polarized interference voltage as a function of fade depth. 
(Atlanta to Palmetto, Georgia, June 1974, 11 GHz.) 


shows this type of behavior. A model for the cross-polarized signal that 
accounts for this observed behavior is proposed and developed in detail 
in this paper. 

This proposed model for cross-polarized interference describes the 
signal as asum of two components 


Vrx = Vox + Vrx, (1) 
where 


Vrx = peak amplitude of the voltage envelope of the total signal 
received due to the transmitted cross-polarized signal. 

Vpx = peak amplitude of the envelope of a “direct” received 
component, due to the transmitted cross-polarized signal. 

Vrx = peak amplitude of the envelope of a random received 
component, due to the transmitted cross-polarized signal. 


The component Vpx derives from imperfect polarization isolation in 
system hardware, and is subject to fading behavior identical to that of 
the copolarized signal. The random component Vprx is attributed to 
random spatial and temporal conditions that characterize the propa- 
gation path, and it is not subject to fading, thereby giving rise to the 
observed behavior of curve NL in Fig. 1. This random component is de- 
scribed here as a gaussian random process, which, as shown below, results 
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in Vrx becoming a Rice-Nakagami distributed random variable. This 
result is supported by experimental data taken at Palmetto, Georgia, 
as discussed by Lin.* 

Use of a gaussian model for Vrx is based on the multiplicity of effects 
that can contribute to this component and on the conjecture that ran- 
dom scattering is a dominant mechanism. The generalized parametric 
modeling study of outage statistics carried out in this paper is predicated 
in part on this latter assumption. Without a model for the dependence 
of the random component of the interference signal on path length, 
outage-time estimates are limited to those few hop lengths where suffi- 
cient experimental data is available to specify the magnitude of this 
component. 

In a companion paper in this issue, Lin‘ also uses the proposed 
Rice- Nakagami model for cross-polarized interference voltage to com- 
pute, by essentially the same method, estimates of DPF channel multi- 
path outage time for QCPSK systems for two existing radio hops with and 
without space-diversity protection. To provide perspective on some of 
the principal differences between Lin’s study and ours, the following 
comments are provided: 


(i) Lin’s analysis of experimental propagation data provides support 
for the Rice-Nakagami model for cross-polarized interference. 

(ii) Estimates of outage time for two specific hop lengths are provided 
by Lin, whereas in this study hop length is a free model parameter. As 
a result, in Lin’s study, outage time for intermediate hop lengths must 
be inferred through linear interpolation between the two point estimates. 
The physical conditions corresponding to such interpolated values are 
not clear, since the two (experimental radio path) interpolation end 
points correspond to different hardware and propagation path charac- 
teristics. Also, this linear interpolation, performed on semilog outage 
time plots, corresponds to a simple power law dependence of outage time 
on path length. 

(iit) Some effects of space diversity on multipath outage time are 
considered in Ref. 4, but not here. 

(iv) The sensitivity of multipath outage time to differences between 
theoretical and experimental DPF channel performance characteristics, 
and the implications for potential outage time reductions, are presented 
in this study, but not in Ref. 4. 


ll. DPF RADIO-CHANNEL PERFORMANCE DURING MULTIPATH FADES 


2.1 Multipath-fading outage probability 


During multipath fading, a DPF radio-channel outage occurs whenever _ 
the channel BER exceeds some defined threshold for acceptable service. 
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Two service outage thresholds are considered here: BER = 10-3 and BER 
= 10~®. A service outage results if 


(¢) a multipath fade occurs and exceeds the thermal-noise fade margin 
(conventionally defined fade-margin), or 

(it) the polarization isolation (XPD) of the DPF-channel degrades 
below that necessary to maintain the required error rate (assuming both 
polarizations are in service). 

The effect on outage time of spectral distortions due to the frequency 
selectivity of multipath fades is not considered here. As a result, opti- 
mistic estimates of multipath fading outages are obtained, and can be 
viewed as lower bounds for actual outage times. 

Under the condition that multipath fading is present, the probability 
of a DPF digital-radio channel outage occurring as a result of a multipath 
fade on a single radio hop is given by the expression 


Pour = f, ” P, (BER = BERo|L)pr(L)dL, (2) 


where L = peak envelope voltage normalized to the nominal (unfaded) 
value—i.e., the ratio of faded to unfaded peak amplitude of the voltage 
envelope, and where P,(BER = BERo|L) is the complement of the con- 
ditional probability distribution for the channel bit-error rate, (i.e., the 
probability that the BER of the channel exceeds that for acceptable 
service (BERo), at a given fade level L), and the function pr(L) is the 
probability density function for the amplitude of the multipath fading 
signal. 

The conditional probability P, for exceeding a given bit-error rate 
BERo, given a fixed fade level L, is defined by the relation 


J, ” yx (¢|L)dé for L > Lm (D) 
Vau(L) ’ 


1.0 for L < L,,(D) 


(3) 


P.(BER > BERo|L) = 


where pyx is the conditional density function for the cross-polarized 
interference-signal envelope, Vjy(L) is the maximum value of Vx for 
which the desired BERo can be obtained with the channel signal faded 
to L, and L,, (D) is the minimum-allowable normalized amplitude of the 
received-signal envelope for a path length D required to maintain some 
specified maximum channel BERo—i.e., the fade level at the system 
thermal-noise threshold. [See Section 3.1 for a graphical interpretation 
and explanation of (3) and (4).] 

It is instructive to rewrite (2) in the following form, using the definition 
in (3): 
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oc 


ff, " pvx(é|L)pr(L)dédL. 
Vm(L) 


(4) 
The first term on the right-hand side of eq (4) represents the outage time 
resulting from fades below the system thermal-noise threshold L,,(D), 
while the second term represents outage time resulting from degradation 
of cross-polarization isolation during fades down to the threshold 
LinfD). 

To estimate total outage probability (or outage time), which results 
during multipath fades, statistical models are required for both the co- 
polarized and cross-polarized signals as received on the copolarized re- 
ceiving channel during multipath fading conditions. In the deep-fade 
region (fades in excess of 10 dB) an accepted statistical model for the 
peak envelope voltage of a signal is given by the probability distribution 
function,}® 


Lm(D) 
Pout = f, pr(L)dL + 
0 Lm(D) 


Pr(l s L) = (1—e7"’) = L?. (5) 


This distribution function results in the density function 
pr(L) = < Pri <L) = 2Le-“’ = 2L (6) 


required in the outage probability expression, eq (2). 

It is shown below that restriction of the statistical model for the peak 
envelope voltage L to the deep-fade region does not restrict results of 
practical interest. This fortunate circumstance results because outage 
statistics accumulate almost entirely during time intervals that corre- 
spond to this deep-fade region. 

With the proposed model for the cross-polarized interference voltage, 
it can easily be shown that the conditional probability density function 
of the envelope of the cross-polarized interference voltage at a fixed fade 
depth L is given by the expression? 


V : V2 + (CoL)? 
pyx(VIL) =~ Ip (CoLV/oixdexp (-~ =), 
ORX 20RX 


(7) 


where o%x = variance of the gaussian component of the cross-polarized 
interference voltage, Jo(-) is the modified Bessel function of order zero, 
and 0 S Cp = 1.0 is (approximately) the ratio of the cross-polarized in- 
terference to copolarized envelope voltages received during unfaded 
propagation conditions, i.e., 


Coy = 10~(XPDo/20) 


where XPD is the cross-polarization discrimination ratio with no mul- 
tipath fading. (Voltages used here and in all subsequent expressions are 


680 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


CIR IN DECIBELS 


M, (1078) 


M_(1073) 


Trp (1073) 





10 15 20 25 30 35 40 
CNR IN DECIBELS 


Fig. 2—DPF-channel digital radio performance curves plotted for constant bit-error 
rate. 


given normalized to their unfaded levels.) Expression (7) is known as 
the Rice-Nakagami probability density function.5~? 


2.2 Characterization of channel performance at constant bit-error rate 


A graphical representation of the relation between carrier-to-inter- 
ference ratio (CIR) and carrier-to-(thermal) noise ratio (CNR) at constant 
bit error rate for a DPF channel can be derived from recent theoretical 
studies.!°!! The two dashed curves shown in Fig. 2 were obtained from 
results given in the above referenced works and represent the relation- 
ship between CIR and CNR at bit-error rates of 10-8 and 10~©. For con- 
venient reference, such curves are referred to as DPF-channel perfor- 
mance curves in this study. 

For a single interferer (the cross-polarized interference signal of the 
DPF channel), the relation between Vx and CIR is 


CIR = 20 logio(Vc/V rx). (8a) 


Note that, in terms of the normalized voltage Vy(L), which appears in 
(3), the values of CIR that correspond to a DPF-channel performance 
curve are given by the expression 


CIRBERy = —20 flogio[ Viv (L)]}. (8b) 
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The relationship of L to CNR is given by 
CNR(D) = CNRo(D) — FD(L), (8c) 


where CNRo(D) is the nominal unfaded design value of CNR for a path 
of length D, and 


FD(L) = —20 logyo(L) (8d) 


is the channel fade depth (L as defined above). The quantity CNRo(D) 
can also be expressed as 


CNRo(D) = CNRi, + FM(D), (9) 


where CNRip is the thermal-noise threshold value of CNR for a given BERg 
with no interference present, and FM(D) is the system thermal-noise fade 


margin for path length D. 
The fade margin can be written as 
FM(D) = —20 logio[Lm(D)], (10) 


where L,, (D) is the normalized thermal-noise threshold signal level as 
previously defined. 

The solid curves shown in Fig. 2 present data from laboratory mea- 
surements of 3A-RDS performance for 10~? and 10-6 bit-error rates. Note 
that threshold values for CIR are apparent from this figure—i.e., a value 
CIRg below which a given BER is exceeded independent of CNR. Use of 
both theoretical and experimental channel performance curves in this 
modeling study is intended to provide some insight into those non-ideal 
channel performance factors that significantly influence multipath 
outage time. These comparisons are unique to this paper. 


2.3 Multipath-fading outage time 


Experimental results on multipath fading statistics at 11 GHz provide 
the basis for obtaining estimates of expected DPF radio-channel outage 
time. From these measurements the expression for multipath-caused 
outage time in minutes per year can be written as 


Tour = rToPout, (11) 
r = multipath occurrence factor 
To = (t/50) 1.33 X 10° minutes o0° 2 4-= 715": 


where ¢ is average temperature in °F. 

The variable 7’ represents the time period during which fading out- 
ages can accumulate (see Vigants!2). The multipath occurrence factor 
r is anormalization coefficient required to account for the influence of 
climate, terrain, signal frequency, and path-length on the quantitative 
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Fig. 3—Graphic representation of outage-probability computation. 


statistical behavior of a signal during multipath fades—i.e., it is required 
to account for variations in the fraction of time that multipath occurs. 
This factor is defined by the expression.! 

r = c(f/4)D210-5, (12) 
where 


c = geographic factor accounting for terrain and humidity effects 
(typically 0.25 =< c < 4, withec = 1.0 for average terrain and 
climate). 

f = frequency in GHz. 

D = path length in miles. 


lil, OUTAGE ESTIMATES 
3.1 Graphical interpretation 

The expressions given in Section II for outage probability during 
multipath fading can be interpreted graphically in terms of a DPF 
channel performance curve. In the discussion that follows, refer to Fig. 
3, which shows a hypothetical performance curve for some fixed 
BERo. 
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Associated with each point in the plane defined by the CNR and CIR 
axes is a probability density for the occurrence, during multipath fading 
conditions, of that pair of values (CIR, CNR) that define this point. All 
points lying along a line perpendicular to the CNR axis correspond to a 
fixed fade depth (cross-hatched strip), while all those points lying below 
the performance curve correspond to the condition BER = BER (shaded 
region). 

In terms of Fig. 3, the expression for the probability P, (BER = 
BER o|L), eq. (3), represents the integration of the probability density 
of occurrence of all points along a line (fixed value L) perpendicular to 
the CNR axis and lying in the region below the performance curve (double 
cross-hatched portion of strip). The expression for total outage proba- 
bility Pour is then the integration of the probabilities along all such 
strips between CNR;;, and CNRo(D) (shaded region). The entire region 
to the left of the vertical asymptote of the DPF-channel performance 
curve (at CNR;;) is associated with the outage probability component. 
which results from fades below the system thermal noise threshold. 


3.2 Analytic results 


The integration of the density function pyx (V|L) indicated in eq. 
(3) can be carried out to give the analytic expression 


se [- a?(L) : 62(L) | 


P.(BER = BER o|L) = k 2 
( o| ) | > (PY 73] a2k(L) 
© | j=0 9 
ee ae ENR = CNR 
eae 20h! ;CNR < CNR;p, 
1.0 (13) 
where 
a(L) = Lok (14) 
ORX 
Vu(L 
pj. (15) 
ORX 


For convenience in computations, a DPF-channel performance curve 
is represented analytically by the expression 


CIR(D) = K, Logyo[1.0 — 107-K2(CNR-CNRin)/Ks-CNR)] + CIRo, 
(16) 
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Table | — Empirical DPF-channel performance curve parameters 


Curve Ky Ko K3 CNRth CIRo 
Trp 
BER = 1073 —12.90 1.00 35.0 10.25 4.1 
M1 
BER = 1073 —7.55 1.09 30.0 13.399 9.95 
Trp 
BER = 106 —13.79 0.95 35.0 13.85 4.85 
ML 
BER = 106 —8.70 0.85 30.0 17.3998 12.9 


Trp signifies DPF-channel performance characteristics derived from theoretical results 
given by Rosenbaum!’ and Prabhu.!! 

M_ signifies DPF-channel performance characteristics derived from laboratory mea- 
surements on 3A-RDS. 


where K,, Ko, and Kg are fitting parameters determined empirically to 
approximate closely the data points as shown in Fig. 2. Table I gives sets 
of values for the various parameters in eq. 16 used to approximate the 
performance curves required for subsequent calculations. 

Table I also establishes the notation that denotes the DPF-channel 
performance curve used to obtain specific results: Trp denotes use of 
the DPF-channel performance curves derived from results obtained by 
Rosenbaum!° and Prabhu;!! M;, denotes use of the DPF-channel per- 
formance curves derived from 3A-RDS laboratory measurements. 

Table II presents other expressions and parameter values used in 
subsequent calculations. 


3.3 Propagation path and system hardware effects 


From the previous expressions for outage probability (eq. 2, and eqs. 
11 through 15), it is apparent that outage time statistics for a DPF- 
channel are dependent on the parameters D (hop length), Co (equipment 
polarization isolation), crx (propagation path phenomena), and r 
(multipath occurrence factor). 

Measures of the sensitivity of DPF-channel outage statistics to system 
hardware characteristics (as represented by Co) and to propagation path 
characteristics (as represented by ogx) are important for evaluating 
system design and cost. 


Table Il — Definitions and parameter values pertinent to numerical 
calculations 


Parameter Definition 


SL = 27.0 + 20.0 logi9 (D) where D is in miles 
SG = FM + SL, where FM = fade margin 


Section loss (SL) 
System gain (SG) 


SG for BER = 1076 
SG for BER = 1073 


SG10-6 = 108 dB 
SGj0-3 = 112 dB 
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Fig. 4—RMS value of random component of cross- -polarized interference voltage as a 
tinction of path length: power law approximations to experimental data points. 


3.3.1 Propagation-path scattering hypothesis 


The parameter ox, a measure of the energy in the random component 
of the cross-polarized signal, is probably dependent on a variety of 
propagation-path parameters. At present, this conjectured path de- 
pendence is not clearly understood theoretically, nor is sufficient ex- 
perimental data available to establish an empirical model. Experimental 
measurements‘ at Palmetto Ga., taken on two propagation paths, pro- 
vide the only directly applicable data for o#x. These two data points are 
shown plotted in Fig. 4, along with various power law relationships be- 
tween o%x and distance D. 

To compute outage statistics via the expressions for Pour |eq. (2)], 
a functional dependence of «x on path length D is required. Using the 
two available data points, we find that an approximate path-length 
power-law dependence of D4 would provide a reasonable fit (12 dB per 
double distance). However, this rate of increase in energy of the random 
cross-polarized signal component seems too large from the standpoint 
of physical intuition to apply over any substantial path length inter- 
val. 

Based on the hypothesis that terrain scattering is a primary contrib- 
utor to the energy in Vex, a very simple, idealized, theoretical estimate 
of «px as a function of path length D was obtained for three antenna 
tower heights (see Appendix B). Figure 5 summarizes results of these 
estimates scaled to the experimental value for o#x obtained on the longer 
of the two propagation paths at Palmetto (26.4 mi/42.5 km). This em- 
pirical scaling of the estimate for o?x is equivalent to use of a scattering © 
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Fig. 5—Prediction for power in random component of cross-polarized interference signal 
based on terrain-scattering model. 


cross section of about —28 dB, in general agreement with experimental 
data from radar terrain-backscatter measurements.!4!5 Note that while 
this scaling also produces reasonable agreement between these predic- 
tions and the measured value of of, on the shorter path at Palmetto, the 
different antennas, widely different antenna heights, and uneven terrain 
characterizing this latter path prohibit any detailed comparison. 

In the absence of experimental data for other path lengths, and in view 
of the general agreement between the terrain scattering estimates and 
the Palmetto data points shown in Fig. 5, theoretical estimates of 04 
are used here to define a functional relation between this parameter and 
the path length D for a fixed tower height. The curve in Fig. 5 for the 
tower height h = 250 ft will be used for this purpose as a baseline in 
calculating outage statistics. 


3.3.2 System hardware XPD requirements 


Variations in Poyr, which result from changes in the value of the 
parameter Co, provide a measure of the sensitivity of a DPF radio channel 
to the level of polarization isolation (XPD) in the radio and antenna 
system hardware. For the two bit-error rates used here, Figs. 6 and 7 
summarize calculations of channel outage statistics as a function of radio 
path length for several values of Co spanning a range of practical con- 
cern. 

Two reference curves, “A” and “B,” are shown in Fig. 6 and on all 
subsequent figures. Curve “A” represents the multipath fading outage 
time for a conventional single-polarization radio channel with identical 
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Fig. 6—Outage-time statistics as a function of hardware XPD for BERo = 1078, 


thermal-noise fade margin and no interference, and curve “B” a per-hop 
one-way outage-time objective typical of that used in the design of 
short-haul radio systems. Reference curve “A” allows for comparisons 
to show the increase in outage probability suffered by a DPF radio 
channel over a conventional channel as a result of fade-dependent deg- 
radation in cross-polarization discrimination. 

Outage-objective reference curve “B” shows, by its intersection with 
a DPF-channel outage curve, an upper limit on radio-path hop length 
at a given BERo. This limit on path length is termed a maximum-reli- 
able-path-length (MRPL) and is used here to illustrate implications of 
outage-time variations. The limit actually imposed on hop length for a 
DPF channel in practice must be determined by the combined outage 
probabilities resulting from rain, equipment failures, and multipath 
fading.!6 

These calculations indicate that for current DPF-channel performance 
characteristics, channel outage during multipath fading is quite insen- 
sitive to the polarization isolation of system hardware as long as a min- 
imum isolation of 20 dB is maintained, i.e., XPDo = 20 dB. This lower 
bound, to an acceptable range for hardware XPD, can be extended to 15 
dB in theory (see Trp curves in Figs. 6 and 7), and can also be extended 
in practice for the larger bit-error-rate criterion (BER S 10~°). However, 
for the measured DPF-channel performance curve at BERg = 107-6, a 


688 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


XPDg= 15 dBN 
XN 


OUTAGE TIME IN MINUTES 





0 10 20 30 40 50 60 
PATH LENGTH IN KILOMETERS 


Fig. 7—Outage-time statistics as a function of hardware XPD for BERo = 1073. 


system hardware XPDp = 15 dB degrades outage performance by almost 
an order of magnitude at any particular path length, and decreases the 
intercept of this outage curve and the one-way-objective curve “B” by 
15 to 20 percent. 

These results have significance from two standpoints. First, the XPD 
requirements on system hardware are moderate and are therefore less 
costly to implement. This is particularly important for the radio antenna 
system. Secondly, maintenance requirements for XPD characteristics 
are relatively loose, again producing a positive cost factor from an op- 
erational standpoint. 


3.4 Numerical results: estimates of outage statistics 


As discussed in Section 3.3.2, the outage probability is not sensitive 
to changes in the value of the parameter Co. Consequently, all compu- 
tations are performed for the upper and lower limits on the practical 
range of interest for Co, i-e., 20 dB = 20 logio(Co!) = 45 dB. Since these 
computed results bracket those for all intermediate values of Co, a strip 
or band of values is shown, in general, rather than a single curve in the 
graphical results presented below. Note that in some cases this “‘band”’ 
has, effectively, zero width and appears to be only a single curve. 

The simple relationship between the outage time (Tour) and outage 
probability (Pour) for a single radio hop during multipath fading is given 
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by eq (11) in Section 2.3. All results given below for Tour are computed 
according to eq. (11) with an average temperature of t = 55°F, which 
gives a value of TJ = 1.467 X 10° minutes. 

Note that all results obtained below are for an “average” multipath 
environment corresponding to the value c = 1.0 in the expression (12) 
for the multipath occurrence factor. All results can be applied directly 
to other environments by direct scaling of outage time via a simple 
multiplication by the appropriate value of c. 


3.4.1 Deep-fade restriction on statistical model 


Figures 8 and 9 show representative sets of curves that display the 
dependence of outage probability on the faded signal level L. The curves 
in Fig. 8 represent the integrand in the expression for Pout (eq. 2), and 
are computed for constant path length D, for the value XPDop = 20 dB. 
From these curves, it is apparent that with adequate hardware XPD 
(XPDo 2 20 dB), significant contributions to Poyr occur only for fades 
L <= 0.32 (—20 logio(L) = 10 dB). This result supports the adequacy of 
the statistical model for the signal envelope peak value used here in 
calculating outage statistics for a DPF channel. 


XPDg = 20 dB 


PROBABILITY DENSITY 
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Fig. 8—Functional dependence of {P.(BER = 10-®|L) p(L)} on fade depth [—20 logio(L)] 
for M; channel characteristics. 
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Fig. 9—Functional dependence of [P.(BER = BERo|L) pr(L)| on fade depth [—20 
logio(L)} for M; and Trp channel-performance curves. 


3.4.2 Path-length dependence of Toyr 


Figure 10 summarizes results obtained by computing outage time for 
average terrain and climate conditions (c = 1.0 in eq. 12) on a single hop 
for a DPF channel. Curves are presented for two bit-error rates: BERo = 
10-6 and BER g 107°. 

Comparison of the curves computed using theoretical and measured 
DPF-channel performance curves show the effect of degradations in- 
troduced by intersymbol interference and other non-ideal conditions 
in the digital terminal and associated radio equipment. Most important 
in this respect is the threshold value CIRg, which appears to dominate 
other parameters. 

From Figure 10, it is apparent that for the model parameters used here 
the error-rate criterion used to specify an MRPL (or, for a particular path 
length of interest, the expected outage time) is not critical. Theoretically, 
the change in MRPL that results in going from 107? BER to 10-6 BER is 
less than 5 percent, while for results based on measured DPF-channel 
performance, the change is only about 10 percent. 

At a particular path length, the increase in outage time incurred by 
going from 10~? to 10~° BER is at most a factor of 2 for results based on 
either the theoretical or measured DPF-channel performance curves. 
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Fig. 10—Outage time vs path length for Tpp and M, channel-performance curves. 


The outage time estimates obtained for measured DPF-channel per- 
formance result in a net decrease of about 20 to 25 percent in MRPL from 
those estimates obtained using theoretical performance. 

Figure 11 draws a comparison between outage-time estimates obtained 
under the terrain-scattering hypothesis for two different antenna tower 
heights for BER» = 10~®. The difference in the estimated outage time for 
these two cases decreases with increasing path length, varying from about 
three orders of magnitude at 10 km to about a factor of 5 at 35 km. 

The MRPL decreases by about 60 percent in going from the 250-ft 
tower to the 75-ft tower, indicating a large increase in cross-polarized 
energy contributed by terrain scattering (see Fig. 5). Assuming that the 
terrain-scattering model is valid (or at least dominant), this result 
suggests that use of large tower heights would be desirable for DPF- 
channel radio systems. 


3.4.3 Dependence of Toy; on system gain 


An index of performance for a radio system is the quantity known as 
system-gain (SG). System gain is defined as the required difference be- 
tween transmitted and received signal-power levels necessary to achieve 
some standard level of performance. As above, the two performance 
standards considered here are BERo = 107? and BERo = 10~®. Typical 
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Fig. 11—Outage-time statistics for two antenna tower heights under terrain-scattering 
hypothesis. 


system-gain values for commercially available DPF-channel digital radio 
systems vary from 95 to 108 dB for a bit-error rate limit of 107°. 

To evaluate the effect of system gain on outage time statistics, cal- 
culations of Tour have been made using SG as a parameter (with XPDo 
= 20 dB). Figures 12 and 13 summarize results obtained for the two 
bit-error rate standards, with SG varying from 95 to 115 dB in 5 dB 
steps. 

Results for the two different error rates are quite similar. Based on 
theoretical DPF channel performance, outage-time statistics vary 
somewhat (less than a factor of 2) as the system gain increases from 95 
to 100 dB, but remain virtually unchanged in the range 100 = SG = 115 
dB. 

Using measured DPF-channel performance for the model parameters 
used here, outage time is almost totally insensitive to system gain for path 
lengths in excess of 18 to 20 km. For shorter path lengths, some differ- 
ences in outage time do result as system gain is changed, but since the 
values of outage time for these short paths are very small in comparison 
to the system objective, such variations are not of practical concern. 

For path lengths in the vicinity of the MRPL, system gain does not 
appear to affect significantly the outage performance of a DPF chan- 
nel. 
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Fig. 12—Outage-time statistics as a function of system gain for DPF radio channel with 
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Fig. 13—Outage-time statistics as a function of system gain for a DPF channel with BERo 
107°, 
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Fig. 14—Sensitivity of outage-time statistics to power level of random component of 
cross-polarized signal. 


3.4.4 Environmental sensitivity of Tour 


As discussed in Section 3.3.1, outage statistics are dependent on 
propagation path characteristics expressed through the parameter o?x. 
Various characteristics of the geographic location of a radio propagation 
path will influence the value of «7x. To explore the sensitivity of results 
obtained here to change in the magnitude of ox, calculations for Tour 
were carried out with o?x as a parameter. 

Figure 14 summarizes results obtained by computing Tour as a 
function of path length for three values of ox; in each case, o2x has the 
same relative variation with D—e.g., the curve shown in Fig. 5 for h = 
250 ft. These computations show that Tour varies by about one order 
of magnitude as o%x varies by one order of magnitude (10 dB). (The 
midrange value of o?x corresponds to that obtained from the Atlanta- 
to-Palmetto, Georgia data set at 42.5 km.) 

These results indicate the importance of obtaining data on o? x from 
a variety of locations to establish its range of variability. Depending on 
the magnitude of such variations, the allowable path length for reliable 
digital communication via DPF radio-channels could vary significantly 
with geographic location—e.g., the results shown in Fig. 14 show po- 
tential changes of +25 to +30 percent as o?x varies by +10 dB. 
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IV. SUMMARY 
4.1 Radio-system XPD requirements 


An important consideration for the practical implementation of DPF 
digital radio systems is the degree of isolation required between the two 
orthogonal, linearly polarized waves transmitted at each frequency. 
Severe requirements on such isolation (XPD) imply increased system 
costs resulting both from hardware design and system maintenance. 

In general, a minimum requirement of 20 to 25 dB for hardware XPD 
is not severe from either a design or maintenance standpoint. This level 
of XPD results in DPF channel error-rate performance well above the 
minimum service standards used here for unfaded signals (see Fig. 2). 

The estimates of outage statistics obtained in this study with a variety 
of model parameters indicate that outage performance for current DPF 
channels during multipath fading is not materially affected by the sys- 
tem-hardware XPD as long as it remains better than 20 dB (see Figs. 6, 
7, and 10). Multipath-related outages do not, therefore, impose any se- 
vere hardware design or maintenance requirements. 


4.2 Maximum reliable path length 


The example outage estimates computed in the above sections for 
representative system parameter values suggest that for a 0.02 percent 
two-way reliability objective for a 400-km route, and with a high antenna 
tower (250 feet), the maximum reliable path length (MRPL) for a DPF 

‘channel is about 21.5 km (13.4 mi) for a bit-error rate of 107? in the ab- 
sence of other outage effects, and for an average multipath environment 
(see Fig. 10). This MRPL can vary from 17.5 km (10.9 mi), on Gulf Coast 
or overwater paths, to 31.0 km (19.3 mi) in dry, mountainous regions 
(Albuquerque), reflecting changes in the magnitude of the multipath 
occurrence factor for these environmental extremes. 

For a bit-error rate BER S 10~®, the range of MRPL in these example 
calculations is 17.0 km (10.6 mi) to 26.0 km (16.2 mi) with a value of 20.0 
km (12.4 mi) for an average multipath environment. 

The rather steep slope of an outage-time curve of Fig. 10 in the vi- 
cinity of its intersection with the outage-time objective curve “B” indi- 
cates that the changes in MPRL resulting from inclusion of rain and 
equipment outages will not be severe, i.e., a small decrease in MRPL al- 
lows for a substantial contribution to outage time from these other 
sources of channel outage. Note that the preceding statement does not 
hold for rain-limited propagation paths (e.g., in the Miami, Florida area) 
where multipath outages are strongly dominated by rain-induced out- 
ages. The actual magnitude of such changes will, however, depend on 
the rain statistics for a given geographic location. 


696 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


4.3 Dependence of path length on system gain 


For digital radio systems with high system gain, the exact value of 
system gain, while significant for the average BER performance of a DPF 
channel, does not significantly affect the MRPL. In the examples con- 
sidered here, changes on the order of 20 percent in system gain (in dB) 
result in very small changes in MRPL—i.e., less than 1.0 km. 


4.4 General conclusions 


The model proposed and developed here for DPF-channel outages 
during multipath fading, together with results obtained from the ex- 
ample calculations carried out for representative system and environ- 
mental parameter values, lead to the following general conclusions: 


(t) Outage times during multipath fading conditions are between two 
and three orders of magnitude larger for a DPF channel of current design 
than for a conventional (single-polarization-per-frequency) radio channel 
with no interference. 

(ii) The threshold carrier-to-interference ratio (CIRo) below which 
a given bit-error rate cannot be obtained, independent of the signal- 
to-noise (thermal) ratio, is the channel performance characteristic of 
greatest importance to multipath outages. Up to an order of magnitude 
improvement in outage time could be obtained if a near theoretical value 
of CIRo could be achieved through hardware/design improvements. 

(iit) Multipath outage time is dependent on geographic and envi- 
ronmental conditions expressed through the multipath occurrence factor 
r and the statistical parameter of. Outage-time estimates computed 
here scale almost linearly with changes in 0%, other model parameters 
remaining approximately constant. 

(tv) Under the hypothesis that terrain scattering is the dominant 
mechanism for the random component of the interference signal in a DPF 
channel, multipath outage time for such channels can be minimized 
through use of high antenna towers and, when possible, through judicious 
choice of radio-hop terrain. 
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APPENDIX A 
Symbol Table 


a(L) = CoL/orx. 
B(L) = Vu (L)/orx. 

orx = Standard deviation of random component of cross- 

polarized signal. 

on = Normalized, cross-polarizing, bistatic, terrain-scattering 
cross section. 

BER = Bit-error rate. 
BERo = Fixed maximum value for BER. 
c = Geographic multipath-fading normalization factor. 

Co = Ratio of cross-polarized to copolarized signal voltages dur- 
ing periods of no multipath fading—i.e., Co = 10~(XPDo/20), 

CIR = Carrier-to-interference ratio in dB for single interferer 
arising from cross-polarized DPF-channel signal. 

CNR = Carrier-to-(thermal)-noise ratio in dB. 
CNRo = Nominal unfaded design value of CNR. 
CIRo = Value of CIR below which BERg is exceeded independent 
of CNR. 

CNR, = thermal-noise threshold value of CNR at a given BERg with 

no interference present. 
D = Path length (miles or kilometers). 
f = Frequency in GHz. 
FD = Channel fade depth [—20 logio(L)]. 
FM = System thermal-noise fade margin. 
h = Tower height for radio antennas. 
Io = Modified Bessel function of order zero. . 
{K1, = Empirical fitting parameters for DPF-channel performance 
Ko, curves. 
K3} 
L = Ratio of faded to unfaded peak envelope voltage. 

Lm = Minimum allowable normalized amplitude of the received- 
signal envelope (for a path length D) required to maintain 
some specified maximum channel BERo. 

MRPL = Maximum-reliable-path-length. (The path length 
determined by the intersection of an outage time curve and 
the system outage-time-objective curve). 

My = DPF-channel performance characteristics derived from 
laboratory measurements on 3A-RDS. 

P. = Conditional probability of DPF-channel BER exceeding 
some specified value BERg at a given, fixed multipath fade 
level L. 

pr(L) = Probability density function of multipath fading signal. 


698 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


Pr = Probability distribution function of multipath fading signal. 

Pout = Probability of DPF-channel BER exceeding some specified 
value BER during periods of multipath fading. 

P, = Receiving antenna radiation pattern. 
P, = Transmitting antenna radiation pattern. 

Pvx = Conditional probability density of cross-polarized 
interference envelope voltage. 

r = Multipath occurrence factor. 
R, = Distance from transmitting antenna to terrain-scatter 
point. 
R». = Distance from terrain-scatter point to receiving antenna. 
SG = System-gain (the difference between transmitted and 
received signal power levels necessary to achieve some 
standard level of performance, i.e., BER = BER9). 
t = Average temperature in °F. 
T = Base time period for multipath fading activity during one 
year. 

Tout = Total time (in minutes per year) during which the DPF- 
channel BER can be expected to exceed some BER as a 
result of multipath fading activity. 

Trp = DPF-channel performance characteristics derived from 
theoretical results given by Rosenbaum! and Prabhu.!! 

V = Peak envelope voltage. 
Vu = Maximum normalized value of Vrx for which BER < BER. 
(ups) = Energy received as a result of terrain scattering, including 
conversion from one linear polarization to the orthogonal 
linear polarization. 

Vpx = Peak envelope voltage of direct component of cross- 
polarized signal. 

Vex = Peak envelope voltage of random component of cross- 
polarized signal. 

V. = Peak envelope voltage of copolarized signal. 

Vrx = Peak envelope voltage of total cross-polarized signal. 

XPD = Cross-polarization-discrimination ratio (ratio of copolarized 
to cross-polarized received signal in dB). 

XPDo = Cross-polarization discrimination ratio during period of no 
multipath fading. 


APPENDIX B 
B.1 Estimation of «7, Based on Terrain-Scattering Hypothesis 


Terrain scattering along the radio-hop propagation path is one 
physical phenomenon that can contribute energy to the random com- 
ponent of the cross-polarized interference signal in a DPF radio channel. 
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Fig. 15—Idealized radio-hop geometry used in terrain-scattering computation. 


The magnitude of this component is not easily determined experimen- 
tally or theoretically, but an estimate can be obtained through an ide- 
alized numerical calculation scaled to available data. 

Figure 15 shows the geometry assumed as a basis for calculating the 
energy contributed to the cross-polarized-interference signal by terrain 
scattering. The expression used to compute this random-scatter com- 
ponent is 


Pr(x,y) Pr (x,y) 

Ri RS 
where on (x,y) is a factor proportional to the cross-polarized, bistatic 
scattering cross section of the ground and various normalization factors. 
Interest here is centered on the functional dependence of (vps) on 
radio-hop path length, rather than the absolute magnitude of this 
quantity. The functions P; and P, are the antenna response patterns 
at the transmitter and receiver, respectively. 

Given this restricted interest and the very sharply directive radiation 
pattern of typical transmitting and receiving antennas at 11 GHz (a 
pyramidal horn-reflector-antenna response pattern was used for cal- 
culations performed here), the variation of oy (x,y) with location is ne- 
glected and assumed to have unity value. The area over which the double 
integration shown in eq. (17) is performed is limited by an error criterion 
for the incremental contributions that occur during evaluation of this 
expression. The error criterion used corresponds to neglecting contri- 
butions from areas illuminated by very low-level side-lobes of the 
transmitting antenna radiation pattern, and sensed by the low-level 
side-lobe region of the receiving antenna response pattern. 





(ups) = Sf dxdy | on(x,y) (17) 
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Normalizing the computed estimates to the experimental values ob- 
tained at Palmetto, Georgia (see Section 3.3.1), the normalized scattering 
cross section has a value of about —28 dB, which agrees within a few dB 
with radar cross-section measurements on various types of ter- 
rain.14:15 

The results of the calculations of scattered cross-polarized energy are 
summarized in Fig. 5 for three different tower heights. These curves 
display an approximate path-length dependence varying from D? for 
long paths to D® for short paths. 
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Loss Analysis of Single-iViode Fiber Splices 
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This paper analyses losses caused by the misalignment of two fibers 
joined ina splice. We consider the possibility that the two fibers of 
different dimensions are separated in longitudinal direction and are 
tilted or offset with respect to each other. Central to our discussion is 
the observation that the modes of single-mode fibers are very nearly 
gaussian in shape regardless of the fiber type—step-index or graded- 
index. The splice losses are thus related to the corresponding losses of 
gaussian beams. We specify the relation between the actual mode field 
and the gaussian beam that matches this field optimally. The trade-off 
between slice tolerances with respect to tilt and offset is expressed as 
an “uncertainty principle.” Because of the near-gaussian nature of 
single-mode fiber fields, our results are immediately applicable to the 
excitation of single-mode fibers by gaussian-shaped laser beams. 


I]. INTRODUCTION 


Light transmission losses of single-mode fiber splices depend on the 
alignment accuracy of the fiber ends relative to each other.! We assume 
that the fibers are immersed in index-matching fluid to minimize re- 
flection losses at the fiber ends. Most troublesome are transverse mis- 
alignments (offsets) and angular misalignments (tilts). Fiber splices 
are surprisingly tolerant of longitudinal misalignment. 

We begin our discussion by showing that the fields of single-mode, 
step-index fibers are very nearly gaussian in shape. This observation 
holds with even more assurance for parabolic-index fibers, because the 
modes of the infinitely extended parabolic-index medium are themselves 
gaussian and are changed only slightly by the truncation of the index 
profile at the core boundary. Once it is established that fiber modes may 
be closely approximated by gaussian field distributions, the evaluation 

of splice losses reduces to the computation of transmission losses between 
misaligned gaussian beams.*? 

We present formulas for relating the width of the gaussian field dis- 
tribution to the fiber parameters. An implicit relation for all types of 
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fibers is given and explicit formulas are derived for the important cases 
of step-index and parabolic-index fibers. Next, we present simple ana- 
lytical expressions for the transmission coefficient of fiber splices for the 
case of longitudinal, transverse, and angular misalignment for fibers of 
different dimensions. 

Transverse splice tolerances become less stringent for fibers whose 
mode fields extend farther in transverse directions. Wide mode fields 
can be obtained by selecting a core index that is very nearly equal to the 
refractive index of the cladding. However, a wide fiber mode is less tol- 
erant of angular misalignments. The relative tolerance of fiber splices 
with respect to offsets and tilts is expressed as an “uncertainty princi- 
ple.” 

We limit our discussion to “weakly guiding” fibers* defined by the 
relation n;/nz — 1< 1, where n, is the maximum value of the refractive 
index of the fiber core and ng is the value of the cladding index. The 
transmission coefficient of weakly guiding fiber modes can be obtained 
by matching only the transverse components of the electric field vector 
of the two modes; their transverse magnetic field components are au- 
tomatically matched approximately. We designate the electric field 
vectors of the modes (guided and radiation modes) of the fiber by the 
symbol E,. The incident electric field E at the input end of the fiber can 
then be expressed in terms of fiber modes as follows?: 


E => c,E,. (1) 


The summation symbol indicates symbolically summation over guided 
modes (only one for single-mode fibers) and integration over radiation 
modes. The symbol v labels the modes (we use v = 0 as the label of the 
guided mode of the single-mode fiber). Mode orthogonality allows us 
to obtain cg from (1), 


1 Qn © 
Co= Sp f, dd f, (E X H,) e, rdr. (2) 


Ho is the magnetic-field vector of the guided mode, e, is a unit vector 
in the direction of the fiber axis, and r and ¢ are cylindrical coordinates 
in the plane at right angles to the axis of the fiber. We assume that the 
fiber is receiving radiation from either an input fiber at a splice or from 
a free-space gaussian laser beam. Correspondingly, E represents the field 
that the first fiber of the splice generates at the input end of the second 
fiber or, alternatively, the gaussian beam mode of a laser. 

The power transmission coefficient, finally, is obtained from (2) by 
the relation, 


T= |co|?. (3) 
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ll. REPRESENTATION OF THE FIBER MODE AS A GAUSSIAN BEAM 


The guided modes of weakly guiding fibers are very nearly transverse 
and linearly polarized.* The electric field vector of the input field consists 
likewise of one dominant transverse component.* Let us assume that the 
input field is gaussian 


2 
Ey = [4V uo/e9 P/tnqw?|!/? exp iG =) e- ibe, (4) 
w 


The refractive index nz equals the cladding index of the fiber, P is the 
power carried by the field and is identical to the P parameter in (2), w 
is the width parameter of the gaussian field, @ is its propagation constant, 
Mo and ¢o are the magnetic susceptibility and the dielectric permittivity 
of vacuum. 

We wish to compare the gaussian field to the mode of the step-index 
fiber,° 


2 Jo(U-) rsa 
Hyo=- V2 , . aaa a VmeP eis : (5) 
KW)” (We) oe 


The P parameter is identical to those in (2) and (4), W and U are related 
to the important V parameter by the equation, 


U2 + W2= V2 = (nj — n5)k2a?. (6) 


The free space propagation constant of plane waves is k = 27/ and a 
is the core radius of the fiber. Jo and J; are Bessel functions and Ko is 
the modified Hankel function. The parameter U can be related to the 
propagation constant 6, [omitted from (5)] as follows: 


U = (njk? — B2)!2a. (7) 


By substitution of (4) and (5) into (2) and (3), we obtain the trans- 
mission coefficient of a gaussian input beam exciting the HE;; mode of 
a fiber. The r-integral in (2) must be evaluated numerically. It is clear 
that the value of T’ depends on the width parameter w of the gaussian 
beam; T assumes a maximum as a function of w. The maximum value 
of T is plotted as a function of V in Fig. 1. It is remarkable how closely 
T approaches unity over the range of V-values shown in the figure. At 
the important point V = 2.4, we have T = 0.9965. V = 2.4 is close to the 
largest value at which the fiber supports only one mode. The next higher 
mode comes in at V = 2.405. It is apparent that at V = 2.4 the field dis- 
tribution of the fiber mode matches the gaussian field almost perfectly. 
The best match is achievable at V = 2.8; T decreases very slowly for 
larger values of V. For smaller V-values, the decrease and consequently 
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Fig. 1—Maximum value of transmission coefficient between a perfectly aligned and 


detimally adjusted gaussian beam and a single-mode step-index fiber. 


the mismatch between gaussian field and fiber mode is more pronounced; 
but even at V = 1.2, we have T = 0.946, a value very close to unity so that, 
even for such small values of V, the gaussian beam is a reasonably good 
approximation of the fiber mode. It-can be shown that the optimum value 
of w divided by the core radius is only a function of V. Figures 2 and 3 
show the optimum values of w/a as a function of V as the solid line. This 
function can be approximated very closely (to within a fraction of 1 
percent) by the empirical formula, 

1.619 | 2.879 

y3/2 x ys 
This equation holds, of course, only for step-index fibers. The meaning 
of the dotted curve in Fig. 2 will be explained later. 

It is desirable to have similar relations for graded-index fibers because 
this would enable us to predict their splice losses. The fields of general 
graded-index fibers are not known explicitly, so that we cannot use eqs. 
(2) through (4) to optimize the width of the corresponding gaussian 
beams. However, we can use a different approach. If we insist that (4) 
should be used to approximate the guided mode of a single-mode fiber 
with refractive index distribution n(r) forr <a andn(r) =neforr>a, 








= 0.65 + (8) 
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Fig. 2—Normalized optimum width parameter w/a as a function of the V-parameter 
for step-index fibers. The dotted line is obtained from the approximate procedure ex- 
pressed in eqs. (19) and (20). 


we may substitute (4) into the wave equation 


d2E 1ldk 
Jy += + 2 k2 — R2 = 
dr? r dr ee ae 
and obtain 
4 sr? 
li (pa 1) + mPrDR? #7) = 0 - 
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Fig. 3—Same as Fig. 2 for wider range of V-values. 
For a graded-index distribution, 


n(r) =n E = (). a] (11) 


with g = 2, (10) can be satisfied exactly. Note that (11) is an infinitely 
extended parabolic-index profile. In this case, we find 


w= Vea (12) 


and 
B= [nik = 2V/a2)1/2, (13) 
We define the V-parameter for any value of g by the equation 
V = nykav 2A. (14) 
This expression is also a good approximation of (6), if we use 
A=1-7 «1. (15) 
ny 


Equations (12) and (13) are not correct for actual parabolic-index fibers 
whose refractive index distributions are given by (11) (with g = 2) only 
for r <a, but assume the form n(r) = ng forr > a. We refer to profiles 
of this kind as truncated index distributions. 
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It appears reasonable to attempt an approximate evaluation of w and 
@ by squaring (10) and integrating over the entire infinite cross-section. 
The parameter optimization is then achieved by requiring that the ex- 
pression so obtained assumes a minimum value. After dividing by the 
constant 27, we obtain 


co 4 r2 2 9 . 
J = §, Ee (ae 1) + n2(r)k2 -— 6° E.rdr=min. (16) 
0 Lw? \w2 


This empirical extremum principal leads to two equations: 


od 

— 17 

Sa (17) 
and 

od 

2B (18) 


It is actually the expression in brackets that should vanish if the equation 
could be satisfied exactly. FE; serves only the purpose of a weighting 
function. For this reason, we consider the width parameter of E, as 
constant, unaffected by the differentiation in (17). (It was found em- 
pirically that this procedure leads to more accurate results.) Substitution 
of (4) into (16) allows us to obtain from (17) and (18) 

Free 
62 = ( f, n2(r) exp (—2r?/v?)rdr| - = (19) 
w 0 w 
and 


So [ie Ga) +t] OG) 
exp (—2r?/w?)rdr = 0. (20) 


Equations (19) and (20) must be solved simultaneously. Analytical so- 
lutions are impossible to obtain so that we resort to numerical solu- 
tions. 

It might be expected that the optimization procedure works best for 
parabolic-index fibers, since it yields the exact result (12) and (18) for 
infinitely extended parabolic index profiles. We expect that the step- 
index profile presents the worst possible case. It is also a member of the 
class of profiles given by (11) forr <a and by n(r) = no forr >a and is 
obtained in the limit g — ~. The result of solving (19) and (20) for g = 
100 is shown as the dotted line in Fig. 2. At V = 2.4 the approximate 
optimization procedure is in error by 3 percent. The percentage error 
decreases for larger values of V (but for very large values of V, the 
agreement becomes poorer once more). At V = 1.5 we have an error of 
14 percent; smaller values of V are of little practical interest. We see that 
the method works surprisingly well for the step-index fiber. Comparisons 
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Fig. 4—Normalized optimum beam width w as a function of V for the parabolic-index 
fiber. The dotted line applies to an infinitely extended parabolic-index profile. 


of the accuracy of the method for other values of g are harder to make 
since the exact field distributions are hard to obtain; no attempts were 
made to evaluate the performance of the parameter optimization pro- 
cedure for smaller values of g. However, there is little doubt that the 
results for values near g = 2 will be much better than the comparison 
shown in Fig. 2. 

Figure 4 shows the numerical solution of (19) and (20) for the trun- 
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Fig. 5—w/a as a function of V for several values of the power law parameter g defined 
by (11). 


cated parabolic-index profile—that is, for n(r) given by (11) with g = 
2 for r <a and by n(r) = no forr > a. The dotted line shown in Fig. 4 
applies to the infinitely extended parabolic-index profile and represents 
the solution (12). It is clear that the field distribution of the truncated 
parabolic-index profile is wider than the field of the infinitely extended 
profile because that part of the field that extends into the cladding is no 
longer under the focusing influence of the graded-index distribution. 
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{c) 
Fig. 6—Several types of splice imperfections. 


For large V values both curves coincide; at V = 5 the difference of the 
two curves is already reduced to 1 percent. The solid line in Fig. 4 can 
be expressed by the empirical approximation 


w ES 0.23 18.01. 


yaa * ye. (21) 


This equation gives the ee width of the gaussian field profile that 
best approximates the actual field distribution of a parabolic-index 
fiber. 

Figure 6 shows the optimum width w/a computed from (19) and (20) 
for truncated graded-index profiles for several values of the exponent 
g appearing in (11). 


ill. SPLICE LOSSES 


Henceforth, we represent the fields of single-mode fibers by gaussian 
field distributions of the form (4) keeping in mind that the optimum 
width parameters w of the gaussian can be obtained as solutions of (19) 
and (20) for general graded-index fibers, or, explicitly, by (8) or (21) for 
step-index and parabolic-index fibers. 

The different types of splice defects are shown in Fig. 6. We allow both 
fibers joined by the splice to have different parameters shown as different 
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core diameters in Fig. 6. The actual differences may consist of different 
refractive index distributions as well as different core diameters. For our 
analysis, each fiber is represented by the width parameter of the opti- 
mum gaussian field distribution, w; belongs to the fiber with radius a, 
and we belongs to the fiber with ao. 

. The relevant formulas can all be derived by using (2) and (38) with the 
fields of both fibers represented by gaussian field distributions of the 
form (4). The E field in (3) is understood to be the gaussian field of the 
first fiber transformed to the input plane of the second fiber. Such mode 
matching calculations involving gaussian fields are not new, we present 
here only the results.” 


3.1 Longitudinal fiber separation 


For the splice shown in Fig. 6a, we find the power transmission coef- 
ficient 


2 
4| 4724-4 
ws 
Pe. aga. gee ee) 
| 422 + S| + 472~2 
W9 Wy 
The normalized fiber separation distance is defined as 
D 
LS (23) 
Nokw Wo 


Two special cases are of interest. At D = 0, we have 


2w We 2 
To = (=) . 24 
: (oa) | > 
For D — ~, we obtain asymptotically 
1 Nokw Wo\ 2 
To. = = (———). 
zap) (25) 


3.2 Splices with tilt 


For the fiber tilt shown in Fig. 6b, we obtain the power transmission 
coefficient 


T= ( 2w We y a [- ae (26) 


wi +w (wi + wa)r2 


When the tilt angle 6 becomes large enough to make the exponent of the 
exponential function in (26) unity, the transmitted power decreases to 
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1/e of its maximum value. This angle is given by the expression, 


2 4 wey 1/2 r 
a, = (eae A _ (27) 
2 TN9W1W9 


3.3 Splices with fiber offset 


The power transmission coefficient through the fiber splice shown in 
Fig. 6c assumes the form 


Qwywe \2 2d2 
T= Goren |- al 28 
w? + ws exp wi + ws 22) 


The amount of offset that reduces the transmitted power to 1/e of its 
maximum value can be defined as 


2 2 1/2 


For identical fibers with w; = we, we obtain a very useful and interesting 
relation by combining (27) and (29) 

dee =: (30) 

nov 

This expression is reminiscent of the uncertainty principle of quantum 
mechanics, because it states that as one of two variables becomes smaller, 
the other must become larger. If a single-mode fiber is designed with a 
small value of A, to allow the field to spread out in transverse direction, 
w becomes large and, consequently, d, may be large indicating that a 
large offset can be tolerated. Equation (30) states that for large values 
of d, the tilt angle tolerance decreases. A fiber that is tolerant of large 
offsets is intolerant with respect to tilts and vice versa. 


IV. DISCUSSION AND NUMERICAL EXAMPLES 

Throughout our discussion, we are using the width parameter w of the 
field distribution (4). Experimental observations of the light field of a 
single-mode fiber detect the light power instead of the field intensity. 
The power may also be approximated by a gaussian distribution of the 
form 


P = Po exp (—r?/w3). (31) 


The power width parameter wy is related to the field intensity width 
parameter w by the expression 


Wp = ve (32) 
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Fig. 7—Power transmission coefficient T as a function of the ratio w;/wy of the beam 
width parameters of the two fibers joined by a perfectly aligned splice. 


This relation is important if the width of the mode field is known from 
measurements instead of being inferred from the known V-value of the 
fiber. 

We begin our discussion of splice losses by considering two perfectly 
aligned fibers with different dimensions. Figure 7 shows a plot of the 
power transmission coefficient T as a function of w1/we. This function 
is, of course, identical if plotted versus w2/w,. A ratio w;/w2 = 1.4 (or 
0.71) causes a power loss of 10 percent. If we assume that we are dealing 
with step-index fibers, we see from Fig. 2 that a reduction of the V-value 
from V = 2.4 to V = 1.68 causes w/a to increase by a factor of 1.4. (An 
increase of the V-value has far less influence on the beam size.) These 
changes of V translate directly into changes of w only if a is kept constant 
and V is changed by varying A. Now let us keep A constant and change 
V from 2.4 to 1.68 by decreasing the value of the core radius a. This 
change increases w/a by 1.4, which means that the beam width is actually 
decreased by a factor of 0.98. This example shows that a change of the 
core radius does not cause a proportional change of the beam width. 

In the remainder of our discussion, we assume that the beam widths 
of the guided modes of both fibers joined by the splice are identical, w 
= wo. Figure 8 is a graph of (22) for a step-index fiber splice with \ = 1 
pM, Nz = 1.457, and V = 2.4. The figure illustrates how insensitive a fiber 
splice is to longitudinal separation of the fiber ends. However, this figure 
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Fig. 8—Power transmission coefficient as a function of normalized longitudinal fiber 
displacement for identical step-index fibers. The curve parameter is the ratio a/A of the 
core radius to free space wavelength. 


was drawn under the assumption that both fibers of the splice are im- 
mersed in index-matching fluid. For splices in air, we must set no = 1 
in (22) and (23), which leads to lower values of T. The figure shows that 
larger core radii result in lower splice losses. However, it must be re- 
membered that the figure is drawn for a fixed value of V = 2.4, fibers with 
larger core radii, thus have smaller values of A. 

The transmission coefficients for tilts and offsets are gaussian func- 
tions of the tilt angle 0 or the amount of offset d. Using normalized 
variables, mn2w6/d for the tilt and d/w for the offset, we can represent 
both cases in Fig. 9. For w; = w2 = w (27) simplifies to 


a 
ANow 


and the amount of offset (29) that causes T to drop to 1/e = 0.368 of its 
maximum value reduces to 





6. = , (33) 


de = w. (34) 


We illustrate the meaning of these expressions with a specific example. 
Let V = 2.4, \ = 1 um, and A = 0.002 so that we obtain a core radius of 
a = 4.15 um for ng = 1.457. For the step-index fiber, we find from (8) or 
Fig. 2, w/a = 1.1 or w = 4.56 um. (The corresponding value for the par- 
abolic index fiber would be w = 4.48 wm.) The power transmission 
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Fig. 9—Power transmission coefficient as a function of normalized offset or tilt 
angle. 


coefficient would be T = 0.368 for d = w = 4.56 um, or it would be T = 
0.9 for d = 1.5 um. If the axis of the fibers (joined by the splice) are lat- 
erally aligned, but if there is a tilt, a tilt angle 0 = 0.048 radians = 2.7° 
causes T' = 0.368, while 6 = 0.91° reduces T from its maximum value to 
T = 0.9. A fiber with a narrower width parameter w would be less tolerant 
of offsets, but correspondingly more tolerant of tilts. This mutual rela- 
tionship is expressed by the “uncertainty relation” (30). 


V. CONCLUSION 


Using the close match between gaussian beams and the field distri- 
butions of single-mode fibers, we have presented formulas and graphs 
for the power transmission coefficient of light through a fiber splice. The 
fibers on either side of the splice need not be identical. Splice losses occur 
for mismatched fiber parameters, transverse fiber displacement (offset), 
angular displacement (tilt), and longitudinal displacement®. All four 
cases have been discussed. Splice tolerances with respect to tilt and offset 
are mutually exclusive. This relationship has been expressed by an “un- 
certainty principle’. 

The results presented in this paper are immediately applicable to the 
excitation of single mode fibers by gaussian-shaped laser beams.78 
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Minimum Impulse Response 
in Graded-Index Fibers 


By J. S. COOK 
(Manuscript received October 15, 1976) 


A straightforward analysis slightly extending the work of Kawakami 
and Nishizawa and of Gloge and Marcatili provides some insight about 
the extent to which the mode differential delay in a graded-index fiber 
can be minimized. We show that an ideal fiber (no mode mixing) with 
uniform mode excitation and loss and uniform material dispersion can 
theoretically have an rms pulse broadening due to mode differential 
delay as small as about 0.02LA2no/c. We suggest that further im- 
provement can result through recognition of differential mode loss and 
by accurate control of the (non-zero) rate of change of dispersion with 
fiber index. 


Il. INTRODUCTION 


It is known through simple first-order analysis that differential delay 
between the propagational modes in multimode optical fibers can be 
greatly reduced by grading the optical index of the core so that the 
index 


n =no(1 — AR), (1) 


where RF is the fiber radius normalized to unity at the core-cladding 
boundary. It is also known that even less differential delay can be realized 
theoretically by slightly perturbing the gradient from this parabolic 
shape. 

We have taken some direct steps based on existing analyses to deter- 
mine how much further improvement might be realized if the optimum 
gradient could be realized in an “ideal” fiber, where geometry is invariant 
over its length (no mode mixing) and where material dispersion is in- 
variant with radius. The approach is very simple and will be so stated, 
but the algebra is tedious and what little has been included will be found 
in the appendix. 
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In their very nice analysis, Kawakami and Nishizawa! showed that 
an improvement in fiberguide impulse response could be obtained by 
perturbing the parabolic profile through the addition of a small 
fourth-order variation in the index gradient. They suggested that the 
minimum pulse width would be obtained when the fourth-order coeffi- 
cient, 6, lies between the values %, where all meridional modes are syn- 
chronous, and 1, where circular spiral modes are synchronous. Minimum 
total pulse width, 7, in fact occurs when 6 = %, and minimum rms width, 
a, occurs when 6 = 4. Their expression for n can be written: 


n = nol[l — 2AR2 + 6(2A)2R4]1/2, (2) 


It will be seen presently that when the index gradient is near optimum 
it takes some care to keep track of which propagating modes are the 
fastest and which are the slowest. The overall pulse width, 7, is simply 
the difference in arrival time between the slowest and fastest modes. If 
we plot 7 vs 6 [found by solving (5) for all mode numbers at each 6 and 
taking the difference between the extremes], the curve is continuous but 
its derivative is not (Fig. 3 below). Personick? has pointed out that the 
rms pulse width, o, (the second moment of the received pulse) is more 
useful for fiberguide system analysis than is r. 


_ J, Preoae ee Jiteae tp(t)dt7 


o (3) 


f p(t)dt f p(t)dt 


where p(t) is the power arriving at a given point at time, t. Note that 
da/d6 is continuous, hence minimum o can be found by direct compu- 
tation. 

Gloge and Marcatili have shown in their analysis* that minimum total 
pulse width occurs when 


n = no[l — 2AR20-4)]1/2, (4) 


For convenience, we introduce a multiplier, p, in the exponent of R in 
(4), namely, 


n= nof[1 — QAR20—eAd)] 1/2, (4’) 


By definition, minimum total pulse width occurs for p = 1; minimum rms 
width, o, however, occurs when p = 1.2. 

These results are found by determining in each case the time of arrival 
of energy propagating in the yp, vy mode (radial, azimuthal mode number) 
with respect to the arrival time of the zero-order mode: 


LA2no 


t(6) = 
o 2cM? 


[(1 — 36/2)(2u + v + 1)? + 6/2(p — 1)?| (5) 
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a’ = 0.031 


Fig. 1—Normalized time of arrival vs mode number (right) and mode count vs time of 
arrival (left) for n = no[1 — 2AR2 + (54)(2A)?R4]!/2; ie, 6 = 54. 


LA?2no 


2cM2 


where L is the length of the fiber, c is free-space light velocity, and M 
is the largest guided-mode number. 





t(p) = [(2u + + 1)? -— pM(2u ++ 1)], (6) 


In both cases, 
M= (Qu 7 USE iG ee or mno(a/Xo) V2A, (7) 


where a is the core radius and Xo is the free-space wavelength of light. 
Also, since we must include both EF and H waves and two polarizations, 
a fiber with near-parabolic core gradient carries a total of about M? 
modes. 

Equation (3) can be solved easily if we assume all modes are equally 
excited [p(u, v) = 1] and integrate over all modes. So 


M (M-»)/2 M p(M-»)/2 ; 
f, if Ga w\didy f, f, eG wieudy 
Fe ee SR, ee mn 


M?/4 M?/4 


(8) 

Substitution of (5) and (6) into (8) (assuming M > 1 to simplify the 

algebra) and minimization with respect to 6 and p, respectively, produce 
the results already stated. 

We can substitute these minimum values back into (5) and (6) and 

plot arrival time as a function of » and v, as shown in Figs. 1 and 2. Also 
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shown in these figures (on the left) is a plot of relative power vs time of 
arrival, again assuming uniform excitation of the modes. 

Figures 3 and 4 show 7’ and o’ vs 6 and p, respectively. The prime de- 
notes normalization with respect to LA2no/c. 


RELATIVE POWER, p(t’) 











o' = 0.029 


Fig. 2—Normalized time of arrival vs mode number (right) and mode count vs time of 
arrival (left) for n = no{1 — 2AR2(1-64/5)]1/2, je, p = 5%. 





Fig. 3—Normalized total pulse width, 7’, and rms pulse width o’ vs 6 for n = no[1 — 2AR? 
+ 6(2A)2R4]V2, 
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Fig. 4—Normalized total pulse width, 7’, and rms pulse width o’ vs p for n = no[1 — 
2A(R)2.-eA)}1/2, 


It is natural to question whether we can improve on these independent 
functional perturbations from the parabola by an appropriate combi- 
nation thereof. Combination of (5) and (6) yields: 


LA2?2no 
2cM? 





t(5, p) = (Ge) (Qu +» +1) 


+20 1)? — pM(2u +» + |. (9) 


Substituting (9) into (8), as before, and minimizing with respect to both 
6 and p produces: 6 = 44, p = %. Substituting these into (9) and neglecting 
the 1s (large mode count) yields: 


LA2no 1 1 
t 2+ wy + (—) v2 - (= 
ais Peay ij) ; G) M(2u + | (10) 


This is plotted in Fig. 5. 
Now combine (2) and (4’) and let 6 = 14 and p = % to find 


n = no[l — 2AR21-24/3) + (14) (2A)2R4]}/2 (11) 





as the near-optimum index gradient. Equation (11) can also be writ- 
ten 


n ® no{l — AR? — A?e(R)], (12) 


which may be more convenient if we are seeking the excursion of the 
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t’ = —0.08 CONTOUR 





RELATIVE POWER, pit’) 





o' = 0.0216 


Fig. 5—Normalized time of arrival vs mode number (right), and mode count vs time 
of arrival (left), for n = no[1 — 2A R2-2a/9) + (1/3)(2A)2R4]1/2; Le., p = %, 6 = ¥. 


improved gradient from the inverse parabola of (1): 
9 4 
(R) = (5) R? In (1/R2) — a (13) 


This is plotted in Fig. 6. 


ll CONCLUSION 
The rms pulse width in an ideal fiber with parabolic gradient is 
a ~ 0.145 ane (14) 
For the profile determined in this paper, 
a = 0.0216 aAnne (15) 


Hence, nearly an order-of-magnitude improvement in fiberguide impulse 
response over that produced by an inverse parabolic gradient could be 
realizable in an ideal multimode fiber. 

Uniform excitation of the modes was assumed for this analysis. If the 
low-order modes tend to carry more power,” even smaller o could be 
realized and Fig. 1 would suggest that » — 0 and 6 — % might be more 
nearly optimum. 

Olshansky and Keck‘ have shown that when material dispersion varies 
with radius, the optimum gradient is significantly different from para- 
bolic and from those discussed here. On the other hand, it can be shown 
that procedures similar to those indicated here can be carried out; and 
if the dependence of the dispersion on optical index could be controlled 
to a predetermined practical value, even less differential delay than that 
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Fig. 6—Index perturbation from inverse parabola for minimum rms pulse width [see 
eqs. (12), (13)]. 


indicated here could be realized. This does not necessarily contradict 
the results of the calculation by Arnaud and Fleming® based on an 
analysis® which includes tunneling modes at equal weight with the more 
clearly guided modes. They showed that if we assume a very specific 
non-zero dependence of dispersion on index, namely, that resulting from 
the inclusion of germania in silica, the differential delay of the optimum 
graded-index fiber is considerably degraded. We suggest, however, that 
if we carefully choose materials to enhance the index (a propitious mix 
of germania and phosphorous oxide, for example), we might provide a 
dispersion vs index dependence that would improve rather than degrade 
the mode differential delay, at least at a particular light wavelength. 
This goes well beyond our present ability to make measurements and 
control materials, however, and only leads us to conclude that im- 
provement in technology can potentially bring significant improvements 
in the information-carrying capacity of low-loss graded-index fibers. 
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APPENDIX 
It is our purpose first to justify (5) and (6). Equation (5) comes from 
Ref. 1, eq. (29), which translates directly to 


“na V2A 


w2n 2 
uo Opp 4) 





—-~= 


c2 
2A 
— 6[(6u2 + 6u(v + 1) + (v + 1)(v + 2)] 3) (16) 
a 
where a is the core radius. 


This can be solved for 6 to find 


Bag aos gpa) ee 
Cc 


A 36 6 
-*(G-9) (Quty+12+2(2- 1)|. (17) 
wna? 2 2 

The largest propagating mode numbers can be found by recognizing 
that the phase constant, 8, can be no lower than the phase constant, kg, 
of an unbound wave in the cladding, where 





= no[1 — 2A]!/2 = ng (1 — A). (18) 
So 
Brain © ois pO Gia). 719) 
Ka. (19) yields 
(Qu +v + Umax = M = erates (20) 
M2 = one (21) 


Substitution of (20) into (17) shows that each term on the right-hand 
side thereof is of order A smaller than the previous. 

The time of arrival of energy carried in any mode through length, L, 
of fiber relative to the arrival of energy in the zero-order mode is 


t= 1| < - on. (22) 
Ow Ow 
Differentiation of (17) and substitution of that [utilizing (20)] into (22) 
produces eq. (5). 
Equation (6) is found from substitution of GM(17) [i.e., eq. (17) of 
Gloge and Marcatilli in Ref. 2] into GM(19), recognizing that the a of 
Gloge and Marcatili is our 2(1 — pé). First, we must identify m/M of 
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Fig. 7—All modes that lie under the line intercepting the » axis at M/2, the » axis at M, 
are propagatirig modes. 


GM(17). The symbols are unfortunate, but we find this equates to our 
(Qu + v + 1)2/M2. 

This is easier to justify than derive. All modes are associated with 
positive mode numbers that lie in the y, v plane, as shown in Fig. 7. We 
can plot the identity of (20) on the plane to bound the crosshatched area 
in Fig. 7 within which lie all propagating mode numbers. The total 
number of propagating modes is four times the total mode-number 
combinations in that area, which can be seen by inspection to be M2. 
Gloge and Marcatili? showed that for their assumed functional index 
variation, the phase constant depends on the total mode number inde- 
pendent of the ratio of u to v. If, then, we draw a dotted line parallel to 
the limit line, as in Fig. 7, the sum of all modes with phase constant 
greater than that represented by those on the line are identified by 
numbers that lie within the double crosshatched area. This Gloge and 
Marcatili call m, and we call (2u + vy + 1)?. Their M is the total number 
of propagating modes, which is our M2. [We have neglected the extra (1) 
since we have assumed M >> 1. We would have to be more precise if only 
a small number of modes were involved.] 


We can now translate GM(17) to be 
(Qu +v +1)? 
aoe = (Sqm/A)?2 


or 


A(2u + v + 1) 
6 = 2 
GM Mu (23) 


This in GM(19) results in (6). 
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Performing the integrations for (8) is simple but tedious. A helpful 
formula, however, is 


(M—v)/2 Mn! = ypntl 
Qu + vy) "dp =——_-.. 24 
f, (24 + 2) "dp = (24) 
The evaluations of (8) for the two cases of (5) and (6) are 
LA?2ng 1 [1 26 . 76271/2 
6) =e a 25 
Sa Ge 5 3 a 2) 
LA?ng 1 1 2p al 
= ——_ ——] ---—+-] . 26 
OS a oe k 5 6 20) 


For the combined case, 
A? 2 2 Op p2 1/2 
LA*no _1 [5-3 76° _ 2p eae! (27) 
2V3L4 8 15 5 6 15 


Minimizing these is straightforward. 


o(6, p) = 
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Simplified Theory of the Multimode 
Fiber Coupler 
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We simplify the coupling theory between two contiguous, parallel, 
multimode step-index fibers, describe the coupling concept, and derive 
an upper estimate for the overall coupling efficiency between the two 
fibers. The maximum coupling obtainable, according to this estimate, 
is less than 72 percent (—1.5 dB). The coupling efficiency derived for 
short coupling lengths shows good agreement with experimental re- 
sults. 


1. INTRODUCTION 


A multimode fiber tap-coupler is a useful component for certain op- 
tical communication systems such as the optical data bus. Unfortunately, 
it is not an easy matter to evaluate the simultaneous coupling process 
between the hundreds or thousands of modes. An analysis of the problem 
has been given by Snyder and others.!~° These authors analyzed only 
the coupling between certain mode pairs. Snyder‘ recently reported the 
total power transition between multimode fibers. However, his conclu- 
sion is based on HE1,, modes for analyzing the crosstalk. 

We have derived a similar simplified expression for the total coupling 
between identical, contiguous, parallel, step-index, multimode fibers, 
which can expand to all modes under the restriction that two fiber cores 
are touching each other. We predict that the maximum coupling effi- 
ciency is less than 1.5 dB when all modes carry an equal amount of power. 
The distance between two fibers affects the coupling efficiency very 
seriously when fibers have large numerical apertures. 

Our simple formula agrees very well with experimental results, in spite 
of a large number of approximations made. 


ll. COUPLING COEFFICIENT 
2.1 Simplified coupling coefficient 
Figure 1 shows the geometry of the fiber coupler. The cores are parallel 
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Fig. 1—Geometry used to derive the coupling coefficients. 


to each other and surrounded by a medium that has the same refractive 
index as the cladding. 

A general description of strongly coupled soit igas fibers is given 
in Appendixes A and B. The coupling coefficient in eq. (24) has a com- 
plicated form. It involves the modified Bessel functions and many dif- 
ferent parameters that depend upon the eigenvalue equations. We will 
simplify the coefficient under the following assumptions: (i) the two 
fibers are identical, (ii) only coupling between modes having the same 
propagation constant is considered (see discussion in Section 2.2), and 
(iui) the distance, d, between the two fiber axes is nearly equal to 2a, 
where a is the core radius of the fiber. 

We can rewrite the coupling coefficient by using assumptions (i) and 
(it). The result is 


[Ca.B; | = [Caiai| 
V2A u? |Ko(wd/a) cos (2la) + Ko(wd/a)| 
a v3 Ki-1(w)Ki4.1(w) 


(1) 


where 
vy = % when! =0 
y =1whenl>0 
a = (defined in Fig. 1) 
a = radius of core 
No = refractive index of core 


A = relative refractive index difference 
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d = distance between the two fibers 
1 = azimuthal order number 

v2? =w2+u? 

v2?=w2+u? 

v2 = a2k2n2(2A). 


The coupling coefficient (1) is still complicated because it includes the 
modified Bessel functions. Further simplification is necessary to present 
a simple physical picture for the coupling process. 

We use assumption (iii) to simplify (1) using the asymptotic expan- 
sions of the modified Bessel function. As discussed in Appendix C, the 
modified Bessel function term in (1) can be expressed by a very simple 
expression for a fairly large range of azimuthal order numbers. We use 
the average value of |C;| as shown in Appendix C. The result is as fol- 
lows: 





—— ee 2A u2 V 2w 
Galetees =( = ea eee) 9 
[Cail = [Cail = (¥ —— 7G = he (2) 


where 
d/a =~ 2 
y = 1 for all modes except LPom; y = % for LPom. 
We let y equal 1 for all modes. This assumption does not seriously 


affect the results of this analysis. 
The coupling coefficient is thus, 


V2A ju? V2w 
(3 Van 





|Ca;B;| = | (3) 


a 
where 
d/a = 2. 


The coupling coefficient expressed by (3) does not explicitly depend upon 
the azimuthal order number !|. However, it is still dependent upon the 
solution of the eigenvalue equation. We simplify eq. (3) further by in- 
troducing simple expressions for u, w, and v. 

The parameter v is expressed by the total number of modes in the 
fiber when the total number of modes is large. The results are*® 


v2 = 2N, (4) 
where 
v>»>1 
N = total mode number of fiber A and B. 
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We order all LP modes according to their z-propagation constants 
from the largest to the smallest. We label them by the sequential num- 
bers, 1 = 1--N. For example, the two orthogonal LP9; modes are desig- 
nated as the first and the second mode. The LP;; modes assume the 
orders 3, 4, 5, and 6. For > 1, the cutoff value of u for the ith mode is 
approximately® 


Ucutoff = (2i)1/2 (i > 1). (5) 
We replace u by Ucutorp and we obtain 
u = (2i)/2 (i > 1). (6) 


Equation (3) can be simplified by (4) and (6). The result is, 


ae — BA VSR i 1 \ 1/4 
Casal = (Gil = Gl) 





Wa aN \N N 
exp[—(2N — 27)1/2 (d/a — 2) 
ay a (7) 
or 
QB/CAE 7 j i \1/4 exp[-(2N — 2)? (d/a — 2)] 
Va kengas? | Gy) (4-H) Vdla 
where 


a = radius of core 
k =2n/d 
No = refractive index of core 
A = relative refractive index difference 
d = distance between the two fibers. 


When d/a = 2, the coupling coefficient becomes, 


Te | les 91/4 Al/4 i ; i \1/4 3 
Se I W ehyigl oe lay 7 " (8) 


This simple expression for the coupling coefficient does not require 
the eigenvalue solutions. Figure 2 shows that the coupling coefficient 
reaches a maximum for the mode number i = (4/5)N. The reason is the 
following: the coupling coefficient as expressed by (24) is based on the 
field interaction of the evanescent field tail of a mode of fiber A and the 
core field of the same mode order in fiber B. Generally speaking, the 
higher-order modes have a stronger field in the cladding relative to their 
fields in the core. Therefore, the field interaction between the field tail 
of a mode of fiber A and the core field of the same order in fiber B in (16) 
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PARAMETER: d/2a 


NA=0.4 
Ng = 1.457 


d = 0.633 xm 
a= 50pm 


COUPLING COEFFICIENT |Ci{ 


d/2a= 1.01 
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Fig. 2—Coupling coefficient vs mode order. 


can be expected to increase with mode order until it reaches a maximum. 
Very high mode orders have very small fields in the core and therefore 
the interaction between very-high-order modes decreases again. 

Figure 2 shows the effect of the gap between two fiber cores. Coupling 
occurs only between the higher-order modes as the gap increases. We 
will discuss the coupling efficiency using (7) and (8) in the next sec- 
tions. 


2.2 Coupling efficiency 


The power coupled from the ith mode of fiber A to the ith mode of 
fiber B can be obtained from (25) when the ith mode of fiber A carries 
unit power and the ith mode of fiber B carries zero power at z = 0. The 
coupled power is 


(P4a-p)i = sin? [|C;|z], (9) 
where 


z = the coupling length. 
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The total coupled power is obtained by the summation of (9). (The 
coupling coefficients for some of the modes are nearly equal to zero; 
especially equal to zero are the coupling coefficients for one of the modes 
with | = 0. So the coupling efficiency defined here is an upper bound.) 
If all modes of fiber A carry equal power, then the total power from fiber 
A to fiber B is, 


- oe oS (sin? (|C;]2)]. (10) 


If we treat the mode number i as a continuous variable, the coupling 
efficiency n becomes 


7= TASB = Ot (sin? ({C;|z)di]. (11) 


Pin 





The coupling between the near synchronous modes is not negligible, 
especially when d/2a = 1. However, the upper bound of the coupling 
efficiency is expressed by (11) when IC||Cul ~ |C;|*%fori-o <j s 
t+ o (c/N) « 1, a defined by (25). When the two fiber cores are touching 
each other, the coupling efficiency 7 is 





Pa. N cette *y. 
i= ae : sin? |C;|z di 
1/4 A1/4 
-{ sin? Cae tL e)¥4) de, (12) 
where 


st 


N 


Figures 3 and 4 show several examples. 


lll. DISCUSSION 

As a first example, consider two fibers having the following parame- 
ters: NA = 0.2, a = 25 um (radius), no = 1.457, y = 0.633 um. This fiber 
carries about 1,250 modes (N = 1,250). 

The coupling coefficient is 


(GI = ( 1 VNA ) (t)(1 — t)!/4 exp [-V2N(V11 — t)(d/a — 2)] 


Va Vka ano a Vv d/2a 
_ 1 py XP [=2V2N (V1 ~ t)(d/2a — 1)) 
0.011 e (t)(1 —t) a : (13) 


According to Fig. 2, the maximum coupling efficiency (1.5 dB) is achieved 
at d/2a = 1 and 0.011 z/a = 3.67. Thus, the coupling length z is about 
334a or z = 8.36 mm. 
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NA =0.4 
Ne = 1.457 

A= 0.633 jm 
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d/2a=1 
(CORE IS TOUCHING) 


d/2a = 1.01 


d/2a=1.05 
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Fig. 3—Coupling efficiency vs coupling length. 


d/2a=1 


d/2a= 1.01 


d/2a = 1.05 


10 20 30 
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Fig. 4—Coupling efficiency vs coupling length. 
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COUPLING 


EFFICIENCY: —10 LOG P>/P, Py 
7 
DIRECTIVITY: —10 LOG P3/P2 P3 Po 


FIBER: PK — 760617-1 
No = 1.457 
NA=0.4 
az 55 ~ 60m 


COUPLING EFFICIENCY IN DECIBELS 


@ COUPLING EFFICIENCY 
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DIRECTIVITY IN DECIBELS 
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COUPLING LENGTH IN MILLIMETERS 


Fig. 5—Coupling efficiency vs coupling length; points indicate experimental results. 


When d/2a = 1.01 so that the gap between the two fibers is about 0.5 
um, the coupling efficiency drops to 3.2 dB with the same coupling length 
(about 8.36 mm). When d/2a = 1.05 so that the gap between the two fi- 
bers is about 2.5 um, the coupling efficiency becomes about 13.7 dB with 
the same coupling length. Thus, the distance between two fibers affects 
the coupling efficiency very strongly. 

We now look at another example with respect to the following fiber 
parameters: NA = 0.4, a = 50 um, no = 1.457, \ = 0.633 um. This fiber 
carries about 20,000 modes. The coupling coefficient is expressed by 


|C;| = 0.01 . (t)(1 — t)1/4 
ye OXP [-2 V2N(vV1 — t)(d/2a — 1)| 


Jaion ae 


The maximum coupling efficiency (about 1.5 dB) is achieved when d/2a 
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cos£@, a=0 
sing @, W=7/2 


SIMPLIFIED COUPLING COEFFICIENT 
——— LIMITATION OF COUPLING COEFFICIENT 


£m: LP 
A = 0.00235 dm 


No = 1.457 cos£D, a= 17/2 


a= 12.10310 [um] sing @,a=0 (£>0) 
A = 0.633 [ zm] 


v=12 a=0 
d/2a = 1.0 


; 24 JR 


(iN) (1—i/N)"/4 





©) O 
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 
(u/v)? OR [i/N] 


Fig. 6—Coupling coefficient vs mode order; points indicate numerical results. 


= 1 and the coupling length is 16.7 mm. However, if the ratio d/2a 
changes by only 1 percent so that the gap between the fibers becomes 
about 1 um, the coupling efficiency becomes about 11.7 dB. The gap 
between two fibers affects the coupling efficiency more seriously when 
the fiber has a large numerical aperture and large radius. 

Practically, it is not easy to produce a fiber coupler that has a uniform 
interaction gap over a long coupling length. Figure 5 shows experimental 
results that only cover very short coupling lengths. Plastic-clad fiber is 
attached to the acrylic base and its cladding is peeled off over the re- 
quired coupling length. The exposed cores are pushed together and form 
a parallel coupling region. Silicone of the same type as the fiber cladding 
is injected to form a common cladding around the coupling region. Figure 
4 shows that this simple theoretical approach yields good agreement. 
Figures 6 and 7 show results obtained from a numerical calculation of 
the coupling coefficient (23) and for fibers with the following parameters: 
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NUMERICAL CALCULATION 
———— SIMPLIFIED THEORY 


COUPLING EFFICIENCY 


A = 0.00235 
No = 1.457 





COUPLING LENGTH IN MILLIMETERS 


Fig. 7—Coupling efficiency vs coupling length; points indicate numerical results. 


A = 0.00235, a = 12.1 ym, n = 1.457. The simplified theory shows good 
agreement. 


IV. CONCLUSION 


We have derived a very simple formula for the coupling between two 
multimode fibers and have discussed the coupling mechanism and the 
coupling efficiency. We emphasize that the formula obtained involves 
some rather drastic approximations. However, this coupling formula 
explains the coupling mechanism very clearly and agrees with experi- 
mental and numerical results. 
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APPENDIX A 
Coupling Coefficient 


Coupling between two modes of different fibers was discussed by 
Snyder.!2 We use the formula derived by Snyder. 
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We assume the two fibers to be nearly identical so that the pth mode 
of fiber A can only couple to the qth mode of fiber B, provided the two 
modes have the same z-direction propagation constant. 

In this case, the coupling equations have the following form 


— Ee omer: ore 

Zz 

dB : : 

= Pyesbee SAO Ca: (15) 


where A(z) and B(z) are the amplitudes of the modes of fiber A and fiber 
B respectively. The coupling coefficient is defined by 


Ca= 5 f _ coln}—n2)EaEndS, (16) 
of fiber B 
where 
No = core index of fiber B 
n, = cladding index of fibers A and B 
E, = the normalized electric field of fiber A 


Eg = the normalized electric field of fiber B. 


Weakly guiding fibers have simple field expressions, which are called 
the linearly polarized modes (LP). Each LP}, mode represents a set of 
four modes when | > 0. (When | = 0, LPom represents two modes.) The 
four modes differ in polarization and azimuthal field distribution (the 
sin /@ or cos /¢ term in the field-expansion equation). The field com- 
ponents can be described by? 








zo/Nno aman 
E,=H =f 
af * leo/ne "| Ki (wr/a)K)(w) 
icceleoremigl: 8 os 0D) 
r=a 
where 


son 


_ (py oe uK)(w) 
(*) ( a aluV K)_-1(w)Ki+1(w)| ae 
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These fields satisfy the following eigenvalue equation 
u[Ji-1(u)/Ji(u)] = —w[Ki-1(w)kiw)] 
or 
ul[Ji+1(u)/Ji(u)] = w[Ki41(W)/Ki(W)]. (18) 


We calculate the coupling coefficient between the pth mode of fiber A 
and the qth mode of fiber B, by using (16), (17) and (18). The pth mode 
field of fiber A is defined by /1, wi, a1, r, ¢, and the gth mode of fiber B 
is defined by lo, wo, ao, r’, ¢’. The coupling coefficient is defined by (17). 


© e(no— ne) 
a Ki, (w1) Ki.(we) 


Qn cos 11¢ (cos lod’) 
x dq’ 
J, ¢ fe 11¢ (sin lod’) | 


ag r r’ 
x f, rdr’Ki, (wi) Fis (u2--). (19) 
0 a1 ag 


We introduce the following theorem to change the coordinate (r @) to 
the coordinate (r’, ¢’) shown in Fig. 1: 


CApBq = (Ai) Ale 


Ki, (wi) (or gh =, CDi (wr 2) tose (10 2) 
oe 


The coupling coefficient is, 


2 2 
Cin Oe Se) "px hh] 
ApBa = 9K, (w)Ki,(wo) (Ar) 71Q (11,12) 


x f, : r’dr’Ii, (v1 oa Ji, (us = 
0 ay ag 


@ _eg(ng = Ne) 
2 Ki,(w)Kig(wa) 12 n 1) 
ah ee 


w@ _eo(ng — Nem 
2 Ki,(w1)Ki,(w2) Ay,A1,[(Q (1, lo) |[R (14, L2)] 


where 


d 
Q(l1, Is) = (-1)8-2Ky,-1, (us a) cos (11 — lea 
1 


d 
+ (—1)442Ky 4), (wr ) cos (I; + laa 
a1 


740 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


a u a 
Ry be) =$— | 2 igs (usd in (wr) 
uo\2 W1\2 Lae ay 
ey) Ge 
ag ai 
w a 
a = Ji(us)li, $4 (1 ~) | 
ay ay 
If we substitute the eigenvalue equation into R(ly, l2), we obtain 
ag J),(u2) x 


(22) (2a) 
a9 ay 


R(1y, le) = 





Wo ag 
= Ki,41(Wo2)L1, (wi a) 
ae Qi 


Ww a 
+1 Kiplwadlinnr (wi) | (22) 
ay ay 


The final result for the coupling coefficient is 
= knoae 2AouU2 


a2 
CpBy = 7 [wokiye(walt +1 (v1 =) 
ay UD \V9Q ay 


Wy =2 Ky (we)Tige1 (wi 
ay Qa 
d 
{(-W)8-eK iat (1 a aig 


+ (-1)"*2Ky, 415 (w: “) cos (1, + La 
a  , 8) 


1/2 
(Kis wK1,+41wi)Kipts(w2) Kiya») ) 


APPENDIX B 
General Coupling Equation 


The coupling between two fibers, as shown in Fig. 1, is expressed by 
the following coupling equation with the coupling coefficient defined 
by (23). 


dAp : ; 
2) + jBa,Ap(z) =D: Bg(2z)C4,Bq 
q 


dB : . 
Se) + 8p Bye) ee Dd Ap (2) CB, 4, 
D 
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|Ca,Bol = 


xX 


where 
Ap(z) 
B,(z) 
C'A,Bo 


Ba, 
BB, 

ly 

lo 
Ki(z) 
a1 

ag 

Ai 

Ae 


Y1, ¥2 


(y172) V2A; “ 
2 


© [Racutercley cos (1; — Ig)a + Kiy+1p (wid/ay) cos (11 + lea 
2 


[K1,-1(W1) Ki, 41(W1) Kig-1 (We) K154:1(we) |}? 


1u2 
Q4v0 1 





[ott (A) | 


, (24) 


the amplitude of the pth mode of fiber A 

the amplitude of the qth mode of fiber B 

coupling coefficient 

the z-direction propagation constant of the pth mode in 
fiber A 

the z-direction propagation constant of the gth mode in 
fiber B 

the azimuthal order number of the pth mode in fiber 
A 


= the azimuthal order number of the gth mode in fiber B 


the modified Bessel function 

the core radius of fiber A 

the core radius of fiber B 

the normalized index difference between the core and the 
cladding of fiber A 

the normalized index difference between the core and the 
cladding of fiber B 

when /; = 0, lo = 0, y1 = yo = 1; when J; = 0, l2 = 0, y1 = 
2 = 2. 


The power transfer from the pth mode of fiber A to gth mode of fiber 
B is given by the following equation where the pth mode of fiber A carries 
unit power and the qth mode of fiber B carries no power. 


where 
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|Ap(z)|? =li- KAB sin? BABzZ 
|By(z)|? = kga sin? Bapz, 
(64, — BB,)? ii 
kag = | 1 +—)42— Bal 
i 4|(Ca,B,)(CR, Ap) | 


Cp, Ap 
C4,Ba 


pee eel 
AB — 
KAB 


(25) 








KBA = KAB 


z = the coupling length of two fibers. 
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Strong coupling between two modes is expected only when the prop- 
agation constants are matched. Therefore the coupling between two 
multimode fibers, each of which carries hundreds of modes, can be an- 
alyzed by the coupling of modes whose propagation constants are 
matched. 


APPENDIX C 
Simplification of the Modified Bessel Function 


The asymptotic expansion for the modified Bessel function is given 
as follows 


a _ 2 (hn) 
K = ce -—z aie aa 
i(z) v= e x (Qz)n (26) 
where 
|z| is large 
Jarg z| < 32/2 
Tr (! +nt+ 5) 
Un) = AE ALM? ~ 82) «+A? — (n= 1%) _ 2) 
ad n}22n 7 
nin (1 —n+ 5) 


The parameters w and wd/a are large for most modes except those very 
close to the cutoff value. Therefore, most of the modified bessel functions 
can be expressed by the above asymptotic expansions. Especially when 
both order numbers of the mode are small (i.e., w is large), the modified 
Bessel function can be expressed by the first term of the asymptotic 
expansion. We obtain 


Ko)(wd/a) + Ko(wd/a) 
Ki+1(w)Ki-1(w) 
2Qw- 
ad/a 
~ 0 (a = 90°). (27) 





We show below that (27) can also be used for large 1. The coupling 
coefficient has the following modified Bessel function term: 


Ko(wd/a) + Ko(wd/a) 


28 
Kwik AG) 2 
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We combine (26) into (28); then the first term is, 


Ko)(wd/a) _ 2w -w(d/a—2) 
Ki41(w)Ki-1(w) md/a 
a (21, n) 


n=0 (2w)"(d/a)” 

















o n (l+1,n—m)(l—1,m) 
Xx (2w)” 
a 1 a\n 1 
QW _wid/a—2) X, (on, (3) een) n}Q2n 
= —e WwW a XxX 
wd/a = 1 (4(1)2)" ae ae ae 
n=o (2w)” 2, (n — m)!m! 22” 
o | a n [4(1)2]” 7 
QW _ewd/ yao Qu) & 22" n! é 
= \/ — e-wld/a—2) 2 _-_______ (99) 
md/a = 1 2”[4(1)2]” 
n=0 (2w)” 22" n! 


If d/a = 2, the above equation is 


Ky(wd/a) 
Ki41(w)Ki-1(w) 


he (2)" 92n(4]2)n 








wy 2 gs etait) = UE AG ek 
ad/a > 1 Q7(4]2) 
(2w)" 22?n! 
2w 





~ 


~ —w(d/a—2) 30 
ad/a 2) 


Therefore, when ! is large, the modified Bessel function term is expressed 
by the above equation. Then, 


Ko(wd/a) + Ko(wd/a) ss 2Qw 


~ e~w(d/a—2) (31) 
Ki41(w)Ki-1(w) nd/a 


where 
& = 1 when! large 
£ = 2 when / small, w large (even) 


— ~ 0 when / small, w large (odd). 
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Therefore, the coupling coefficient has the following limitation. 
es Q5/4A1/4 Ti 1 1/4 
0= (62) = (Cig eS Li ' 32 
1G sleuml= am [w| [tw] 2 


However, we assume that all Bessel function terms are expressed by 
the case £ = 1 as the average coefficient. The result is 


a 91/4 A 1/4 i 1 11/4 
C;| = —=—~ |=] }1--] . 33 
ha knora?/2 Fa | ul ~ 


C; = average coupling coefficient. 


where 
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This paper presents a detailed discussion of issues involved in the 
design of sub-band coders for low-bit-rate speech communications. 
Specifically, bit rates in the range of 7.2 to 16 kb/s are emphasized. 
Design guidelines, based on results of extensive computer simulations 
and subjective comparisons, are presented for selection of sub-band 
coder parameters. Practical considerations for selecting sub-bands 
under integer-band sampling and multiplexing constraints are also 
discussed, anda method for synchronous multiplexing of the sub-band 
data, without buffering, is proposed. Several examples of sub-band 
coders for transmission rates of 7.2, 9.6, and 16 kb/s are presented ,and 
the quality of these coders is compared against that of ADPCM and ADM 
coders. 


I. INTRODUCTION 


In recent work by Crochiere, Webber and Flanagan,! an approach to 
speech encoding has been proposed which is based on the partitioning 
of the speech band into sub-bands and encoding the sub-bands indi- 
vidually. The technique offers attractive possibilities for coding speech 
economically at bit rates in the range of 7.2 to 16 kb/s. At 16 kb/s good 
quality encoding, comparable to that of 26.5 kb/s adaptive differential 
(fixed predictor) PCM (ADPCM) encoding, is possible. Potential appli- 
cations exist in areas of narrow-band communications, mobile radio, and 
voice storage applications. 

When the bit rate is extended down into the upper data rate range of 
9.6 and 7.2 kb/s, moderate quality encoding can be achieved comparable 
to that of 19 and 18 kb/s adaptive delta modulation (ADM), respectively. 
Interesting potential applications exist for voice coordination on digital 
data lines and for secure voice communications by digital encryption and 
transmission over conventional data lines. 

In the design of sub-band coders, a variety of issues and “‘trade-offs”’ 
must be dealt with. The number of sub-bands, the partitioning of sub- 
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Fig. 1—(a) Implementation of a sub-coder based on integer-band sampling. (b) Fre- 
quency-domain illustration of the sub-band partitioning of the speech band. 


bands (and gaps between bands), coder parameters, parceling of bits 
among sub-bands, and compromises between bits/sample and bandwidth 
are all variables that must be considered. In addition, a number of con- 
straints are introduced by practical considerations of multiplexing the 
digitized sub-band signals and by considerations of efficient hardware 
implementation. In this paper, we attempt to clarify these issues and 
present useful criteria and guidelines for designing sub-band coders. In 
many respects, the only truly meaningful criterion for selecting pa- 
rameters of the sub-band coder is a perceptual one. Therefore, design 
criteria have been supported, as much as possible, by results of extensive 
computer simulations and listener preference tests. 


ll. A REVIEW OF SUB-BAND CODERS 


In the sub-band coder, the speech band is partitioned into sub-bands 
by bandpass filters. Each sub-band is low-pass translated, sampled at 
its Nyquist rate, and digitally encoded. By this process of dividing the 
speech band into sub-bands, each sub-band can be preferentially en- 
coded according to perceptual criteria for that band. On reconstruction, 
sub-band signals are decoded and bandpass translated back to their 
original bands. They are then summed to give a replica of the original 
speech signal. 
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Fig. 2—Integer-band sampling technique and a frequency-domain interpretation. 


A variety of techniques exists for performing the low-pass and bandpass 
translations. However, one approach is particularly attractive for 
hardware implementation since it eliminates the need for modulators. 
It is based on the integer-band sampling method proposed in Ref. 1 and 
will be the method primarily considered in this paper. 

The integer-band sampling implementation of the sub-band coder 
is illustrated in Figs. 1 and 2. The speech band is partitioned into N 
sub-bands by bandpass filters BP; to BPn. It will be assumed in this 
paper that the filters are discrete-time (e.g., digital or CCD) filters. 
Typically four or five bands are used and, at lower bit rates, small gaps 
are permitted between bands to conserve bandwidth and bit rate, as il- 
lustrated in Fig. 1b. 

The output of each filter in the transmitter is resampled at a rate of 
2f;, where f; is the width of the sub-band and refers to the ith sub-band. 
The sampled sub-band signals are digitally encoded and time multi- 
plexed for transmission over the digital channel. At the receiver the 
digital signals are demultiplexed and decoded. The sub-band signals are 
reconstructed by filtering the outputs of the decoders with another set 
of bandpass filters, identical to BP; to BPy, that act as interpolating 
filters. Prior to this filtering, the sampling rates of the decoder outputs 
are increased to the original sampling rate of s(n) by filling in with 
zero-valued samples. The outputs of these filters are then summed to 
give a reconstructed replica §(n) of the original speech signal s(n). 
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The integer-band sampling scheme imposes certain constraints on 
the choice of sub-bands, as illustrated in Fig. 2. Sub-bands are required 
to be between m;f; and (m; + 1)f;, where m; is an integer. This constraint 
is necessary to avoid aliasing in the sampling process. 

Encoding in sub-bands offers several advantages over full-band cod- 
ing.! Quantization noise can be contained in bands to prevent masking 
of one frequency range by quantizing noise in another frequency range. 
Separate quantizer step-sizes are used in each band. Therefore, bands 
with lower signal energy will have lower quantizer step-sizes and con- 
tribute less quantization noise. Finally, the partitioning of the speech 
band into sub-bands enables the parceling of bits in bands according to 
perceptual criteria. In lower bands where pitch and formant structure 
must be accurately preserved, a larger number of bits/sample can be used 
for encoding, whereas in upper bands where fricatives and noise-like 
sounds occur in speech, fewer bits/sample can be used. 

In the following sections, we focus on the various issues involved in 
the design of sub-band coders. Section III addresses issues of coder se- 
lection for sub-bands and the choice of their parameters. “Trade-offs”’ 
involved in the allocation of bits among bands are also discussed. Section 
IV deals with problems of sub-band partitioning of the speech spectrum 
under the constraints of integer-band sampling requirements and 
multiplexing requirements. Section V involves issues in the design of 
filters for the sub-bands. Finally, Section VI presents further results on 
comparisons of sub-band coder performance with other waveform coding 
methods. 


lil. SELECTION OF CODERS AND CODER PARAMETERS FOR SUB- 
BANDS 

Because encoders are individually tailored to each sub-band, a spec- 
trum of coders and parameters must be considered. For the lower-fre- 
quency sub-bands, typically 3 or 4 bits/sample encoders are used, and 
for upper bands 2 or less bits/sample are used. Since the characteristics 
of the sub-band signals are considerably different from those of full-band 
speech, encoding techniques developed for encoding of full-band speech 
signals do not necessarily lead to good results for encoding of sub-band 
signals. In this section, we therefore address issues in the design of en- 
coders for sub-band signals and in the parceling of bits among bands. 

The choice of encoder parameters is determined in part by the static 
or long-term spectral characteristics of the speech waveform. Figure 3a 
illustrates typical long-term speech spectra (averaged over a sentence) 
based on measurements made by Beranek? and Dunn and White.* The 
same spectra are plotted in Fig. 3b with a warped frequency scale based 
on a constant (5 percent/division) contribution to the articulation index? 
in order to illustrate the relative perceptual importance of the various 
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Fig. 3—Long-term spectrum of speech based on measurements by Beranek” and Dunn 
and White. (a) Logarithmic frequency scale. (b) Frequency scale based on a constant 
contribution to the articulation index. 


frequencies. Two possibilities for sub-band selection for low and high 
bit rates (to be discussed later) are illustrated above Fig. 3b. It is seen 
that across the entire speech spectrum there is a characteristic drop in 
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Fig. 4—Typical waveforms of uncoded sub-band signals for bands 1 to 4. Kighty samples 
are plotted on each line. 


power density with increasing frequency. Across any one band, however, 
the drop in power density is relatively small. Since sub-bands are, in 
effect, low-pass translated and sampled at their Nyquist rate, they ap- 
pear essentially as flat spectrum signals at the low sub-band sampling 
rates and have essentially no sample-to-sample correlation. Figure 4 
shows examples of sub-band signals for bands 1 to 4. Because of their 
low sample-to-sample correlation, encoding is best performed by 
adaptive PCM (APCM). Encoding based on differential or fixed predic- 
tion, commonly used for full-band encoding, does not lead to good results 
for encoding of sub-band signals. 

The step-size adaption strategy used in simulations for the APCM 
coders is based on the one-word step-size memory approach proposed 
by Jayant, Flanagan, and Cummiskey.** The coder input signal, denoted 
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Fig. 5—Step-size adaption algorithm and quantizer characteristics of the APCM cod- 
ers. 


as x, for the rth sample, is quantized to one of 2” levels according to the 
quantizer characteristics shown in Fig. 5, where B is the number of bits 
in the coder. The step-size adaption circuit examines the quantizer 
output bits for the (r — 1)th sample and computes the quantizer step- 
size, A,, for the rth sample according to the relation 


A, = A,-1M(L,-1), (1a) 
where 

Amin = A, = Amax, (1b) 
and where A,_, is the step size used for the (r — 1)th sample. M(L,~—1) 
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Table | — APCM coder parameters 


B = 4 3 2 1% 14 1%, 
M, 0.9 0.85 0.85 0.92 0.92 0.92 
Mo 0.9 1.0 1.9 1.4 1.4 1.4 
M3 0.9 1.0 
M, 0.9 1.5 
Ms 1.2 
Me 1.6 
M7 2.0 
Mg 2.4 

Typical 
s/n (dB) 18 11.5 7 4 3.3 2.5 


is a multiplication factor whose value depends on the quantizer magni- 
tude level L,-; at time r — 1. It can take on one of 2-! values, 
M,Mo, «++ Mon-1. If the lower-magnitude quantizer levels are used at 
time r — 1, a value of M(L,_1) = M; less than one is used to reduce the 
next step size. If upper-magnitude levels are encountered, a value of M; 
greater than one is chosen. In this way, the coder continuously adapts 
its step size in an attempt to track the short-time variance of the input 
signal. For practical reasons, the step size, A,, is constrained to be be- 
tween some minimum and maximum value Ayn and Amax, respec- 
tively. 

Typical values of M; for 2-, 3-, and 4-bit APCM coders are given in 
Table I. These values were determined experimentally and were found 
to agree reasonably well with values reported by Jayant‘ for encoding 
of full-band speech. As observed by Jayant, small changes in these values 
do not strongly affect the performance of the coders. Typical signal- 
to-quantizing noise ratios (s/n) found for encoding sub-band signals are 
also reported in Table I. 

An interesting modification to the above algorithm, proposed by 
Goodman, allows for encoding at an average bit rate of 1 + 1/K bits/ 
sample, where K is an integer. In this approach, the sign of the signal x, 
is encoded for each sample, r, and the magnitude of the signal is encoded 
with one bit every K samples. The step-size adaption is essentially that 
of (1) with M; and the quantizer magnitude level repeated for K — 1 
samples at the decoder. For example, if K = 2, asign and a magnitude 
bit are transmitted on odd numbered samples. On even numbered 
samples, only the sign bit is transmitted and the magnitude bit is as- 
sumed to be that of the previous sample. The sign bit transmits essen- 
tially the “zero crossing” or phase information and the magnitude bit 
conveys the amplitude information in the waveform at a reduced 
rate. 

The 1 + 1/K bit coder is found to be useful for encoding the uppermost 
bands when overall bit rates must be kept low. The upper bands contain 
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primarily the fricative and noise-like sounds in the speech and can 
therefore be quantized more coarsely than lower bands without a per- 
ceived loss in quality. Typical adaption parameters found to be useful 
for 1 + 1/K bit coders are also given in Table I. 

The quantities Ayax and Ayn in the above algorithms represent 
practical constraints in the adaption logic. Their ratio determines the 
dynamic range that the coder can handle and their absolute values de- 
termine the center of this dynamic range. In simulations, a ratio 
Amax/Amin = 128 was consistently used, resulting in a useful dynamic 
range of about 40 dB for the coders. The actual values of Ayn and Amax 
must be different for each sub-band, however, to match properly the 
dynamic range characteristics of the sub-band coder to that of the 
long-term speech spectrum. This is easily seen in Fig. 3. Since upper 
sub-bands have lower power densities than lower sub-bands, they should 
have smaller values of Ayjax and Ayn in their coders. A useful criterion 
for choosing relative values of Ayin(AmMax = 128Amin) can be derived 
by assuming that the power-density spectrum in sub-band 1 is ap- 
proximately flat across the band and has a value S;. The long-term 
variance, o;”, of the sub-band signal is then proportional to S;f;. 

To match the center of the dynamic range of the coders in each band, 
Amin Should be selected to be proportional to the square root of the 
long-term variance of the signal in that band. Therefore, the ratio of 
Amin(band i) in band i to Ayrn(band J) in band j can be determined 
as 


Amin(bandi) a: _ 4 /Sifi S 
Amin(band j) o; = * Sjfj 
or if values are expressed in dB, (2) becomes 
Amin(band 1) ER 
— | = Silap— Sjlap + 201 fb 3 
Amun(band j) | 4B lap — Sian 08 V7 (3) 


Equation (3) states that the ratio of minimum step size (in dB) of band 
1 to band j is equal to the difference in power densities (in dB) between 
bandz and band J plus a correction factor to account for the differences 
in bandwidths. Values of S; and S; can be obtained from Fig. 3. Although 
eq. (3) is only approximate, it serves as a useful criterion for choosing 
relative values of Ayn for coders. Good agreement was found with ex- 
perimentally derived values. 

A final consideration in the selection of coders for sub-bands relates 
to the questions of how many bits/sample should be allocated to each 
sub-band under constraints of fixed total transmission rate and how 
should the sub-band bandwidths and gaps between bands be traded 
against bits/sample for the coders. The answer to both questions is highly 
dependent on perceptual criteria and is greatly influenced by the overall 
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allowed transmission rate. Therefore, we do not propose to answer these 
questions in detail but simply provide some insight. 

A useful measure for assisting in the parceling of bits among sub-bands 
is the signal-to-quantizing noise ratio (s/n) as a function of frequency. 
Figure 6 shows typical s/n values as a function of frequency that are 
found to give preferred signal quality at bit rates of 16, 9.6, and 7.2 kb/s, 
respectively. At 16 kb/s it is found that good quality coding can be 
achieved with an allocation of 4 bits/sample (~18 dB s/n) in the lower 
sub-bands, 3 bits/sample (~11.5 dB) in the middle sub-bands, and 2 
bits/sample (~7 dB) in the upper sub-bands. Contiguous sub-bands are 
used. One possible choice of sub-bands is shown above Fig. 6 and will be 
discussed in greater detail in the next section. 

In the other extreme, moderate quality coding at transmission rates 
of 7.2 kb/s can be achieved by trimming the lowest band to 3 bits/sample, 
the second band to 2 bits/sample, and the upper bands to 14 or 144 
bits/sample. In addition, to conserve bandwidth, gaps may be allowed 
between sub-bands as shown in the band arrangement above Fig. 6. 
While these gaps introduce a slightly reverberant quality to the coder, 
the reverberation is generally preferred at this transmission rate to a 
further reduction in bits/sample and a corresponding increase in noise 
in the coders, which would be necessary if gaps were not present. 

At the intermediate transmission rate of 9.6 kb/s, a distribution of 3, 
2, and 14 bits/sample is possible across the frequency ranges, as shown 
in Fig. 6 by the solid line. A second alternative, which is also judged close 
in quality, is given by the dotted line. In this case, 3 bits/sample is used 
only in the lowest band and 2 bits/sample is used for encoding all upper 
bands. In both cases, gaps are allowed between bands, as shown above 
the figure. In listener preference comparisons, 63 percent of the listeners 
preferred the quality of the first bit/sample distribution (solid line) and 
37 percent preferred the quality of the second distribution (dotted line). 
A third approach was also tried at 9.6 kb/s, which involved 8 bits in the 
lowest band, 2 bits in the second band, and 1/4 bits in the two upper 
bands, with no gaps appearing between bands. In this way, the rever- 
berant quality of the coder was traded for slightly lower overall s/n. This 
approach was preferred by only 13 percent of the listeners over that of 
the first distribution (solid line) and by only 37 percent of the listeners 
over that of the second distribution (dotted line). Therefore, at 9.6 kb/s, 
a slight reverberant quality in the coder is preferred by listeners over 
the lower s/n obtained if no gaps between sub-bands are used. 

As observed in the above discussion, many “trade-offs” are possible 
and the only meaningful criterion for comparing them is a perceptual 
one. Often it is a matter of trading one type of distortion for another with 
the hope of finding a compromise that is most acceptable to the majority 
of listeners. 
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Fig. 6—Signal-to-quantizing noise ratio (s/n) as a function of frequency for bit allocations 
for 16-u, 9.6-u, and 7.2-kb/s coders. 


IV. PARTITIONING OF THE SPEECH BAND INTO SUB-BANDS AND 
MULTIPLEXING OF DATA 

The selection of sub-bands involves a variety of considerations. Of 
preliminary interest is the number of bands. Next, bandwidths and 
locations of sub-bands must be chosen. This choice is strongly influenced 
by constraints imposed by the integer-band sampling technique and 
multiplexing requirements. In this section, we discuss these issues and 
present candidates for sub-band coders at various bit rates. 

Through simulations, a good compromise in the number of bands 
necessary for sub-band coding was generally found to be about four or 
five bands. When less than four bands are used, bandwidths become too 
wide and do not allow for full utilization of the advantages of sub-band 
encoding. Designs with more than four or five bands tend to consume 
bandwidth in transition bands of filters in addition to requiring more 
hardware for practical implementation. 

The partitioning of the speech band into sub-bands presents a more 
difficult problem. A useful preliminary guideline for choosing sub-bands, 
suggested in Ref. 1, is to partition the speech band into sub-bands that 
represent approximately equal contributions to the articulation index 
(AI) under noiseless conditions. In this way each sub-band contains a 
significant portion of the important frequencies of the speech band. 
Lower sub-bands should have narrower bandwidths and bandwidths 
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Table |! — Choice of bands for integer-band sampling 
and 9.6-kHz sampling rate 


Decimation 
Ratio fi 2fi 3fi Af; 
1 4800 9600 14400 19200 
2 2400 4800 7200 9600 
3 1600 3200 4800 6400 
4 1200 2400 3600 4800 
5 960 1920 2880 3840 
6 800 1600 2400 3200 
7 686 1371 2057 2748 
8 600 1200 1800 2400 
9 533 1067 1600 2133 
10 480 960 1440 1920 
11 436 873 1309 1745 
12 400 800 1200 1600 
13 369 738 1108 1477 
14 343 686 1029 1371 
15 320 640 960 1280 
16 300 600 900 1200 
17 282 565 847 1129 
18 267 533 800 1067 
19 253 505 758 1011 
20 240 480 720 960 
21 229 457 686 914 
22 218 436 655 873 
23 209 417 626 835 
24 200 400 600 800 
25 192 384 576 768 


should become progressively wider with increasing frequency. Gaps 
between sub-bands can also be determined by this criterion. The allo- 
cation of bits in sub-bands, however, is made according to subjective 
quality considerations, as discussed in the previous section. 

The integer-band sampling scheme imposes the constraint that the 
ratio of upper to lower band edges of sub-bands be (m; + 1)/m;, where 
m, is an integer that may be different for different bands (see Fig. 2). For 
hardware considerations, it is required that the sampling rates for sub- 
bands be derivable from a common clock. Furthermore, for digital or CCD 
hardware implementations, it is desirable to relate these sampling rates 
to the sampling rate of the bandpass filters by ratios that are integers. 
Finally, the requirements for multiplexing digitally encoded sub-band 
signals dictate that the transmission bit rates of each sub-band be a ra- 
tional fraction of the total bit rate so that the data can be framed and 
synchronized. Also, a small fraction of this total bit rate must be reserved 
for synchronizing and framing information. 

This multitude of constraints greatly restricts the choices for sub- 
bands. To assist in the selection of sub-bands, it is helpful to construct 
tables such as Table II. It is assumed in Table II that the sampling rate 
of the bandpass filters is 9.6 kHz. Column 1 indicates the integer deci- 
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mation (reduction) ratios that relate sub-band sampling rates to 9.6 kHz. 
Column 2 gives bandwidths, f;, and column 3 gives 2f/; sampling rates 
for the possible sub-bands. Columns 2 through 4 specify choices for band 
edges mf; (m; = 1,2,3,---). Therefore, all choices for sub-bands are 
discernible from the tables once the sampling rate for the filters is 
chosen. Considerations in selecting sub-bands on the basis of articulation 
index, the distribution of bits/sample across bands, and the total 
transmission rate quickly reduce the choices of sub-bands further to only 
a few possibilities. The final choice is still not complete, however, without 
an analysis of multiplexing requirements. Practically, the transmission 
rate of each sub-band must be a rational fraction of the total bit rate so 
that the sub-band data can be multiplexed into a repetitive framed se- 
quence. The lowest common denominator of these rational fractions, 
including the fraction of transmission rate reserved for synchronization, 
determines the smallest frame size. 

To illustrate these points more clearly, it is helpful to analyze several 
examples of coders. Table III shows one choice of sub-bands that can 
be used for 9.6 and 7.2 kb/s four-band coders. The selection of sub-bands 
is obtained from Table II and corresponds to the low-bit-rate sub-band 
arrangement illustrated in Figs. 1(b), 3(b), and 6. As seen in Fig. 3(b) or 
Fig. 6, the bands all have approximately equal width on the warped 
frequency (constant AI) scale. The lowest sub-band is slightly narrower 
due to constraints imposed by integer-band sampling. A 107-Hz gap 
appears between sub-bands 2 and 3 and a 320-Hz gap appears between 
sub-bands 3 and 4, giving the coders a slightly reverberant quality. 

Coder examples A and B represent 9.6 kb/s coders with bit parceling 
among sub-bands according to distributions shown in Fig. 6 by solid and 
dotted lines for 9.6 kb/s. Example C is a 7.2 kb/s coder with the bit al- 
location in Fig. 6. Also included in Table III are sampling rate reduction 
(decimation) ratios and sampling rates for sub-bands. Relative values 
of minimum coder step-size (expressed in dB) that match the long-term 
speech spectrum, as discussed in Section III, eq. (3), are given in column 
5. Finally, typical s/n values observed for the examples are given at the 
bottom of the table. They were measured by comparing simulations with 
and without coders and represent distortions only contributed by coders 
and not due to band gaps or filtering. 

A fourth coder, example D, was designed for 16 kb/s. The design is 
based on a filter sampling rate of 10.67 kHz (43 X 16), which gives the 
choice of sub-bands shown in Table IV. This led to a slightly better se- 
lection of sub-bands for the 16 kb/s coder and resulted in the five-band 
coder design given in Table V. The sub-band selection corresponds to 
that shown above Figs. 3(b) and 6: Lower sub-bands overlap slightly to 
allow for transition bands of filters so that no gaps appear in this fre- 
quency range. 
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Table Ill — Sub-band coder designs for 9.6 and 7.2 kb/s 
Sub-band Example A Example B 
Band Sampling AMIN 9.6 kb/s coder 9.6 kb/s coder 
Edges Rates Ratios 

(Hz) (Hz) (dB) Bits/samp. kb/s Bits/samp. kb/s 
240-480 480 0 (Ref.) 3 1.44 3 1.44 
480-960 960 -3 3 2.88 2 1.92 
1067-1600 1067 —8.5 2 2.13 2 2.13 
1920-2880 1920 -14 1% 2.88 2 3.84 
0.27 0.27 
Total Bit Rate (kb/s) 9.60 9.60 

Typical s/n (dB) 10.8 10 


Example C 
7.2 kb/s coder 

Bits/samp. kb/s 
3 1.44 
2 1.92 
143 1.42 
1y 2.40 
0.02 
7.20 

8.9 


Table !V — Choice of bands for integer-band sampling 
and 10.67-kHz sampling rate. 


Decimation 
Rate fi 2fi 3fi Af; 

I 5333 10667 16000 21333 

2 2667 5333 8000 10667 

3 1778 3556 5333 7111 

4 1333 2667 4000 5333 

5 1067 2133 3200 4267 

6 889 1778 2667 3556 

7 762 1524 2286 3048 

8 667 1333 2000 2667 

9 593 1185 1778 2370 
10 533 1067 1600 2133 
11 485 970 1455 1939 
12 444 889 1333 1778 
13 410 821 1231 1641 
14 381 762 1143 1524 
15 356 711 2133 1422 
16 333 667 1000 1333 
17 314 627 941 1255 
18 296 593 889 1185 
19 281 561 842 1123 
20 267 533 800 1067 
21 254 508 762 1016 
22 242 485 727 970 
23 232 464 696 928 
24 222 444 667 889 
25 213 427 640 853 
26 205 410 615 821 
27 198 395 593 790 
28 190 381 571 762 
29 184 368 552 736 


30 178 356 533 711 


Table V — Sub-band coder design for 16 kb/s 


Sub-band Example D 
Decimate Band Sampling AMIN 16-k 

From Edges Rates Ratios se EDS COE. 

Band 10.67 KHz (Hz) (Hz) (dB) Bits kb/s 

1 30 178-356 356 a2 4 1.42 

2 18 296-593 593 0 (Ref.) 4 2.37 

3 10 533-1067 1067 —6 3 3.20 

4 5 1067-2133 2133 —11.5 2 4,27 

5 5 2133-3200 2133 —18 2 4,27 

Sync 0.47 
Total Bit Rate (kb/s) 16.00 

Typical s/n (dB) 13.6 


The analysis of the multiplexing requirements for coder examples A 
through D is summarized in Table VI. The required frame length for 
multiplexing is 180 bits for the 9.6-kb/s coders, 405 bits for the 7.2-kb/s 
coder and 135 bits for the 16-kb/s coder. The frame length corresponds 
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Table VI — Multiplexing and framing information for sub-band 
coder examples 





Fraction of 


Band Total Bit Rate Samples/Frame 
Example A (9.6 kb/s) Frame Length = 180 Bits 
I 27/180 9 
2 54/180 18 
3 40/180 20 
4 54/180 36 
Sync 5/180 aot 
Example B (9.6 kb/s) Frame Length = 180 
I 27/180 9 
2 36/180 18 
3 40/180 20 
4 72/180 36 
Sync 5/180 — 
Example C (7.2 kb/s) Frame Length = 405 Bits* 
1 81/405 27 
2 108/405 54 
3 80/405 60 
4 135/405 108 
Sync 1/405 een 
Example D (16 kb/s) Frame Length = 135 Bits 
1 12/135 3 
2 20/135 5 
3 27/135 9 
4 36/135 18 
5 36/135 18 
Sync 4/135 = 





* See text. 


to the number of bits that must be stored or transmitted before the 
multiplexing pattern repeats itself. It is determined by the lowest com- 
mon denominator of the fractions of total bit rate contributed by sub- 
bands and by the synchronization channel in column 2. If the frame 
length is too large, a different sub-band arrangement or bit allocation 
must be chosen. For example, in the 7.2-kb/s coder, only 1 bit in a frame 
of 405 bits is reserved for synchronization. If, the third sub-band is 
quantized with 14 bits/sample, a frame length of 135 bits is possible with 
2 bits reserved for synchronization. This is achieved, of course, at a cost 
of a slightly reduced coder quality. Column 3 in Table VI gives the 
number of sub-band samples represented by each frame of data. 

The fact that the sub-bands are multiplexed in frames does not nec- 
essarily imply that a complete frame of data must be stored before 
transmission. By careful design of the multiplexer, it is possible to syn- 
chronously encode the sub-bands and multiplex them without buffering 
the data. One scheme for doing this, for coder example A, is illustrated 
in Table VII. The table depicts the bit allocation for one frame (180 bits) 
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Table VIl — Synchronous multiplexing of coder example A 
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Table Vil — Continued 





of data. The first column gives the bit or clock number (at a clock rate 
of 9.6 kb/s), the next four columns represent sub-bands, the last column 
represents the synchronization channel and the X’s represent allocated 
bits. Numbers and partitions in each sub-band column represent coder 
sampling times and sampling intervals. For example, in the first sub- 
band, nine samples of data (see Table VI) are coded with 3 bits/sample 
at appropriate clock times, 1, 21, 41, ---, 161. This corresponds to one 
sample every 20 clock times, which is the decimation ratio of sub-band 
1 (see Table III). Within each sampling interval three slots (X’s) are 
allocated for transmission of these three bits and, therefore, they do not 
have to be stored for more than one sampling interval. In the fourth 
sub-band, bit allocations alternate between two slots and one slot per 
sampling interval according to the needs of the 1'4-bit coder. A frame 
sequence begins with the transmission of five synchronization bits. The 
sampling intervals of the sub-bands are offset in time so that these five 
bits can be transmitted together without conflict. The scheme could 
easily be implemented with the aid of a read-only memory (ROM). 

The synchronous multiplexing scheme is also useful as a means for 
conveniently ordering bits in a frame even if frames must be buffered 
for other purposes. Another potentially useful application of synchronous 
multiplexing occurs in an all-digital implementation, where coder 
hardware and possibly filter hardware can be shared between sub- 
bands. 


V. DESIGN AND IMPLEMENTATION OF THE FILTERS 


The parameters of the bandpass filters are depicted in Fig. 7. The 
sub-band covers the frequency range from m;f; to (m; + 1)f;. For prac- 
tical reasons the filter passband must have a slightly narrower frequency 
range from m;f; + Af to (m; + 1)f; — Af. A transition region, Af, on the 
order of 50 to 60 Hz was used in simulations with good results. Filters 
are 175 to 200-tap FIR designs. If wider transition regions are allowed, 
lower-order filters can be used at a cost of an increased reverberant 
quality of the coder. A passband ripple of +0.5 dB gives satisfactory 
results in simulations. 

Signal frequencies outside of the sub-band are aliased into the sub- 
band by the decimation process in the transmitter. This aliasing is il- 
lustrated by the dotted line in Fig. 7. With a filter stop-band attenuation 
on the order of 45 dB, this aliasing is not detectable. Near the sub-band 
edges, a slightly larger amount of aliasing can be allowed, as shown in 
Fig. 7, in order to keep the filter passbands as wide as possible. Filter 
attenuations of 12 dB at sub-band edges were used in simulations. Since 
two such filters are cascaded in the sub-band coder (see Fig. 1), this ali- 
asing is reduced by 24 dB at sub-band edges. It occurs only over a very 
narrow frequency range (a few Hz) and is not detectable. If lower filter 
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Fig. 7—Parameters of the bandpass filters. 


orders (i.e., wider transition bands) are used, correspondingly larger 
attenuations should be used at band edges to compensate for the smaller 
slope of the filter roll-off in the transition regions. 

The overall frequency response of the sub-band coders was measured 
by computer simulations. Figure 8a shows results for a 175-tap FIR filter 
implementation of sub-bands in Table ITI. Similar results are observed 
for IIR elliptic filters of order 6, 6, 8, 8 for bands 1 to 4, respectively. Phase 
distortions introduced by the IIR filters are not perceptible. In fact, the 
“smearing” of the phase helped to reduce the peak factor of the speech 
waveforms and led to a slightly improved performance (0.5 dB) in the 
adaptive coders. Figure 8b shows results of a 200-tap FIR filter imple- 
mentation of the five-band coder in Table V. 

In the receiver, the interpolating filters must have additional passband 
gain in order to restore the signal energy lost by decimation. The gains 
are equal to the decimation ratios. For example, if the sampling rate in 
the transmitter is decimated by 20, the interpolating filter must have a 
gain of 20 to account for signal energy lost in samples discarded in the 
decimation process. 

Several hardware technologies are amenable to the implementation 
of sub-band coders. An attractive emerging technology, already men- 
tioned in Ref. 1, is the charge-coupled-device (CCD) technology.’ It offers 
possibilities for one or more filters on a chip with analog-to-discrete-time 
conversion accomplished essentially automatically. Filter outputs can 
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Table Vill — Comparison of sub-band coders vs ADPCM and ADM 
(1-bit ADPCM) coding. 


Preference 
for Sub-band Preference 
Coder for ADPCM 
Coder Comparison (%) (%) 
A. 16 kb/s sub-band coder (Example D) 
(1) 24 kb/s ADPCM (3 bit) 58 42 
(2) 32 kb/s ADPCM (4 bit) 34 66 
B. 9.6 kb/s sub-band coder (Example B) 
(1) 10.2 kb/s ADM 96 4 
(2) 12.9 kb/s ADM 82 18 
(3) 17.2 kb/s ADM 61 39 
C. 7.2 kb/s sub-band coder (Example C) 
(1) 12.9 kb/s ADM 79 21 
(2) 17.2 kb/s ADM 56 44 


be offered in a convenient sample-and-hold format. The technology may 
also be tractable for the implementation of the coders. 

All-digital technologies also offer many attractive possibilities for the 
sharing of hardware between sub-bands. Efficient computational 
methods are possible for implementing filters for decimating and in- 
terpolating digital signals.® Since digital or CCD filter cutoff frequencies 
are normalized to the filter-sampling frequencies, the bit rates of the 
coders can be varied over a limited range by simply varying the master 
clock frequency—a feat that cannot easily be accomplished with con- 
tinuous-time filter technologies. 


Vil. SUBJECTIVE COMPARISONS WITH OTHER WAVEFORM CODING 
METHODS 

Further subjective comparisons have been made at 16 kb/s and 7.2 
kb/s in addition to comparisons reported in Ref. 1. Thirteen listeners 
were asked to compare pairs of sentences for quality and indicate which 
was better. Two speakers were used in the experiment and several 
comparisons of the same sentence pairs were made by each listener at 
different randomly selected times during the test. The results are sum- 
marized in Table VIII. 

In part A of Table VIII, the quality of the 16-kb/s sub-band coder 
(Example D) is compared against the quality of 24- and 32-kb/s ADPCM. 
It was preferred in 58 percent of the sentence pair comparisons against 
24-kb/s ADPCM and in 34 percent of the comparisons against 32-kb/s 
ADPCM. If the results are linearly extrapolated, the quality of the 16-kb/s 
sub-band coder can be said to be comparable to approximately 26.5-kb/s 
ADPCM. This is a significant improvement over earlier results reported 
in Ref. 1. It was obtained by allowing less overlap of the sub-bands and 
trading the extra bandwidth for more bits/sample in the lower sub- 


bands. 
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Fig. 8(a)—Measured frequency responses for 7.2- and 9.6-kb/s coders. 


The 9.6-kb/s sub-band coder (Example B) is the same coder that was 
used for comparisons in Ref. 1. It is comparable to 19-kb/s ADM in 
quality. A slight improvement on this quality was observed from the 
sub-band coder in Example A. 

In part C of Table VIII, the 7.2-kb/s sub-band coder (Example C) is 
compared against 12.9- and 17.2-kb/s ADM. The quality is preferred over 
that of 17.2-kb/s ADM and, if the results are linearly extrapolated, it is 
found to be comparable to approximately 18-kb/s ADM. 

As seen by the above comparisons, a consistent advantage of about 
10 kb/s in transmission rate is obtained by the sub-band coder over 
ADPCM or ADM for the same quality. Alternatively, at the same bit rate 
an improved quality is possible with the sub-band coder. 
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Fig. 8(b)—Measured frequency response for 16-kb/s coder. 


Vil. CONCLUSIONS 


The design of sub-band coders involves the consideration of a large 
number of parameters and “trade-offs.” For many of these parameters, 
no analytical means exist for choosing them in an optimal way. Conse- 
quently, in this paper we have attempted to provide some useful 
guidelines and insight for selecting parameters of sub-band coders. The 
guidelines are based on extensive computer simulations and subjective 
comparisons. 

A number of practical considerations involved in selecting sub-bands, 
multiplexing sub-band data, and implementing the filters have also been 


SUB-BAND CODER DESIGN 769 


discussed. Several sub-band coder designs have been proposed for bit 
rates of 7.2, 9.6, and 16 kb/s, and their performances have been compared 
with those of other waveform coding techniques. 
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The standard fixed sub-band coding scheme has been modified to 
allow the center frequency of the two upper bands to vary in accordance 
with the dynamic movement of the vocal tract resonances F2 and F3. 
A relatively simple zero-crossing technique is used to measure the 
formants F2 and F8. Through the use of this variable band coder, it is 
possible to produce moderate-quality, intelligible speech at 4.8 kb/s 
(quality is slightly less than that of a 7.2-kb/s fixed sub-band coder and 
equal to that of about a 16-kb/s ADM coder). The reasonably good in- 
telligibility of the 4.8-kb/s variable-band coded speech can be attributed 
to the coder’s attempt to capture and encode those spectral components 
of the signal that are perceptually most significant (the region around 
the formants). The major advantage of the variable-band scheme is that 
its implementation ts considerably less complex than other waveform 
coding schemes or vocoder systems that can produce intelligible, nar- 
rowband speech. 


|. INTRODUCTION 


Recently, a method for digitally coding speech signals in terms of 
sub-bands of the total spectrum was introduced that resulted in an im- 
provement in quality of the coded signal over that obtained from a single 
full-band coding of the total spectrum.!? The rationale for coding the 
signal in sub-bands is based upon the experimental fact that quantizing 
distortion is not equally detectable at all frequencies, and hence, the 
quality of the coded signal can be significantly improved by controlling 
the distribution of quantizing noise across the signal spectrum. Coding 
the signal in sub-bands offers the possibility of achieving this control. 

In the recent work by Crochiere, Webber, and Flanagan,!/? the selec- 
tion of the appropriate sub-bands was guided by the perceptual data 
contained in the so-called articulation index (At). The articulation index 
denotes, on the average, the contribution of each part of the spectrum 
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to the overall perception of the spoken sound. For high-quality speech 
at moderate bit rates (16 kb/s and greater), the frequency range 200 to 
3200 Hz was partitioned into four fixed contiguous bands that contrib- 
uted equally to the AI. The transmission bit rate of the sub-band coder 
could be lowered gracefully by limiting these sub-bands in width and 
by tolerating some spectral gaps that did not contribute significantly 
to the AI. However, carried to excess, the noncontiguous bands produced 
an unpleasant, reverberant quality in the signal that finally resulted in 
unacceptable speech. A bit rate of approximately 7.2 kb/s was found to 
be about the lowest bit rate that still produced acceptable, intelligible 
speech. (The quality at this bit rate was judged about equal to that of 
18-kb/s ADM speech.?) 

In using the AI in selecting sub-bands, it should be noted that this 
index only indicates the average perceptual contribution of each part 
of the spectrum. Since the speech spectrum is highly variable across a 
given utterance, it seems appropriate to select sub-bands that do not 
remain fixed but vary in accordance with the changing character of the 
speech. One way to achieve this goal is to allow the center frequency of 
the sub-bands to follow the variation of the formant frequencies across 
the utterance. The formant frequencies of a particular sound correspond 
to the resonance frequencies of its short time spectrum, and the fre- 
quency bands around these formants are perceptually the most signifi- 
cant regions of the spectrum. It is the purpose of this paper to show that 
by varying the sub-bands in accordance with the formant frequencies, 
it is possible to lower the bit rate to 4.8 kb/s and still maintain a speech 
quality that is approximately comparable to that of the 7.2-kb/s fixed 
sub-band coder. In addition, it is also shown that the formant frequencies 
can be sufficiently estimated for use in the variable-band scheme by a 
simple zero-crossing measurement technique. Thus, the variable-band 
coder can achieve very low data rates (4.8 kb/s) at considerably less ex- 
pense than conventional vocoder systems, while still providing an in- 
telligible signal. 


li. VARIABLE-BAND CODER 


The concept of the variable-band coder is illustrated in Fig. 1. The 
speech band is divided into four sub-bands and encoded separately in 
each sub-band. The two lower sub-bands are fixed bands that cover the 
frequency range from approximately 250 to 820 Hz. This represents the 
region of primary speech energy for voiced sounds. The two upper sub- 
bands are variable bands (with fixed bandwidths) centered about the 
F2 and F8 resonance peaks of the short-time speech spectrum (as illus- 
trated by the dotted line). By varying the center frequencies of these two 
bands as the short-time spectrum changes, the encoder attempts to 
capture the maximum amount of speech energy and represent those 
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Fig. 1—Frequency domain interpretation of the variable-band coder. 


frequencies that are perceptually most significant. Regions between the 
sub-bands are ignored to conserve bandwidth. While these gaps give a 
reverberant quality to the coder, the effect, as will be discussed, is not 
as pronounced as with a fixed-band scheme at the same bit rate. 

The implementation of the sub-band coder can be achieved by any 
of the modulation schemes suggested in Ref. 1. In particular, the most 
efficient approach for implementing the fixed bands is the integer band 
sampling method. For the two upper sub-bands, a modulation scheme 
is required in which the center frequency of the band can be varied. This 
can be accomplished with the complex modulation method discussed 
in Ref. 1. In addition, a method for adaptively varying the center 
frequencies of these bands is required. 

The overall configuration of the sub-band coder then takes the form 
shown in Fig. 2. The formant estimator determines the resonances F2 
and F° in the speech band. This information is encoded at a low bit rate 
and sent to the receiver. It is also decoded and used to control the vari- 
able-band center frequencies in the transmitter. In this way, the variable 
bands in the transmitter and receiver track identically. 

The measurement of the resonances F2 and F3 is accomplished by a 
simple zero-crossing measurement technique. In this method, the in- 
dividual resonances are first isolated by filtering the speech signal into 
frequency ranges appropriate to each formant.‘ After filtering, the re- 
sulting signal is ideally a damped sinusoid, and the formant frequency 
can then be estimated by measuring the axis-crossing rate of the filtered 
waveform. Figure 3 depicts the structure of the formant frequency ex- 
traction system.* To correct for isolated errors in the formant extraction 


* The formants are measured 50 times per second and can be efficiently coded using less 
than 300 b/s by ADPCM techniques.° 
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Fig. 2—Implementation of variable-band coder with integer-band sampling for the fixed bands and complex modulation for the variable bands. 
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Fig. 3—Formant estimation scheme for estimating F2 and F3. 
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scheme and to insure that the measured formant trajectory is not ex- 
cessively rough, a median smoother is employed.® It should be noted that 
although this system of formant measurement is not as accurate as the 
more elaborate methods of linear prediction’ or spectral estimation,4 
the zero-crossing technique is computationally far less expensive than 
these schemes and, moreover, there is no enhancement in quality in the 
variable-band encoded signal when the more sophisticated measure- 
ments of F2 and F8 are used to control the variable sub-bands. 

The sub-band signals are encoded with APCM encoders and the data 
are multiplexed together with the synchronization data and formant 
data, as illustrated in Fig. 2. Typically, more bits/sample are used for 
encoding lower sub-bands for the perceptual reasons explained in Refs. 
1, and 2. Alternatively, a dynamic allocation of bits/sample can be em- 
ployed in a manner similar to that used by Noll for transform coding.” 
Also, a slight amount of center-clipping can be used in sub-bands to re- 
duce idle channel noise. 


Ill. RESULTS OF COMPUTER SIMULATIONS 


The sub-band coder system in Fig. 2 has been implemented by com- 
puter simulation for a transmission bit rate of 4.8 kb/s. Sub-band center 
frequencies and bandwidths corresponding to those in Table I were used. 
These bands also correspond to those shown in Fig. 1. 

The formants were estimated by the method in Fig. 3 and were used 
to control the center frequencies of bands 3 and 4. Figure 4 shows the 
variation of these center frequencies as a function of time for the sentence 


Table I—4.8-kb/s variable-band coder 


Bits/Sample Allocation 


Dynamic 
Center Pe a a a 
Frequency Bandwidth Unvoiced/ 
Band (Hz) (Hz) Fixed Voiced Silence 
1 356 213 3 3 1% 
2 640 356 2 2 2 
3 F2 320 1% 1% 2 
4 F3 320 1, 1%, 2 
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Fig. 4—Center frequencies of bands as a function of time for the sentence “High altitude 
jets whiz past screaming.” 


“High altitude jets whiz past screaming.” A comparison of this plot to 
the spectrogram in Fig. 5a shows that these center frequencies do in fact 
track the major F2 and F3 resonances in the speech signal. 

Two different bits/sample allocation schemes were tried in the sim- 
ulations, a fixed allocation and a simple dynamic allocation scheme. In 
the fixed allocation 3, 2, 144, and 14; bits/sample were used for encoding 
sub-bands 1 to 4, respectively. In the dynamic allocation scheme, 38, 2, 
14, and 1% bits/sample were used for encoding the voiced regions of the 
speech signal for bands 1 to 4, respectively. For unvoiced/silence regions, 
an allocation of 144, 2, 2, and 2 bits/sample for bands 1 to 4 was used to 
encode more accurately the stronger energy in the higher frequencies 
during these intervals. A simple voiced/unvoiced decision was made by 
observing the variable step size of the APCM coder in the lowest band. 
If this step size was greater than five times its minimum allowed size, 
then the speech was assumed to be voiced and the 8, 2, 144, and 1% 
bits/sample allocation was used. If it was less than five times the mini- 
mum step size, then the unvoiced/silence condition was assumed and 
the bits/sample allocation of 144, 2, 2, and 2 was used. 

Figure 5 shows spectrograms of the resulting computer simulations. 
The original sentence is represented by the upper spectrogram of Fig. 
5a. Figure 5b corresponds to a sentence that was sub-band filtered 
(without encoding) with a fixed-band scheme (the two upper bands had 
center frequencies of 1200 Hz and 2300 Hz). In contrast, Fig. 5c shows 
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Fig. 5—Spectrograms of computer simulations. (a) Original. (b) Fixed-band filtered 
speech (uncoded). (c) Variable-band filtered speech (uncoded). (d) Variable-band coded 
ith fixed bits/sample allocation. (e) Variable-band coded with dynamic bits/sample al- 
ocation. 


the same sub-band arrangement except that the center frequencies of 
the two upper bands were allowed to vary according to Fig. 4. Again, the 
sub-bands were not encoded but simply filtered. A comparison of these 
two spectrograms (Figs. 5b and 5c) shows that the variable-band scheme 
gives a better representation of the important spectral features of the 
speech signal than the fixed-band scheme for the same total bandwidths. 
For example, in the words “high” and “whiz,” the F2 resonance is lost 
in the gap between bands 3 and 4 of the fixed-band scheme; however, 
it is clearly present in the variable-band scheme. 

Figures 5d and 5e show the results of the variable-band coder with the 
fixed and dynamic bit allocations discussed earlier. By comparison of 
these sentences with the unquantized sentence of Fig. 5c, the effects of 
the quantization can be observed. Typically, the quantized sentences 
have spectrograms that are more “ragged” in appearance due to the 
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presence of the quantization errors and noise. Little difference was ob- 
served between the quality of the fixed and dynamic bit-allocation 
methods of quantization nor can it be observed on the spectrograms. 
Only aslight improvement in quality, during unvoiced regions, is gained 
by the dynamic bit allocation. More substantial improvements might 
be possible through more sophisticated allocation schemes than the one 
tried here (see Noll§). 

The quality of the 4.8-kb/s variable-band coder was observed to be 
only slightly less than that of the 7.2-kb/s fixed-band coder reported in 
Ref. 2 (the 7.2 kbps coder was rated equal to that of an 18-kbps ADM 
coder). The movement of the two upper bands produced a noticeable 
“swishy” noise in the background. This was more readily observed with 
earphones than with a loudspeaker. Also, the quantization noise of the 
coders gave a slightly hoarse sound to the speech. On the other hand, 
although the quality of the variable-band coder is only moderate, the 
intelligibility of the coded speech is still reasonably good because it at- 
tempts to capture and encode those spectral components of the speech 
signal that are perceptually most significant. 


IV. CONCLUSIONS 


The standard fixed sub-band coding scheme has been modified to 
allow the center frequencies of the two upper bands to vary in accordance 
with the dynamic movement of F2 and F38. The formants F2 and F3 are 
measured by a relatively simple zero-crossing technique. Using this 
variable-band system, it is possible to produce moderate-quality, in- 
telligible speech at 4.8 kb/s. 

The variable-band system can be viewed as a hybrid type coder that 
combines the simplicity of a sub-band coder with the low-bit-rate po- 
tential of a vocoder type system. The ability of the variable-band coder 
to achieve narrowband transmission is directly associated with its vo- 
coder-like utilization of the perceptually significant regions around the 
formants F2 and F3. But, unlike the vocoder, it is a true waveform coder 
that does not attempt merely to model the signal in terms of such fea- 
tures as pitch and vocal tract resonances.‘ It directly codes the entire 
250-Hz to 818-Hz region of the spectrum and two 320-Hz bands centered 
about the crudely estimated values of F2 and F8. The variable-band 
coder can thus avoid the computationally expensive analysis-synthesis 
systems required of a vocoder, and can produce moderate-quality, in- 
telligible speech in a relatively inexpensive manner. 
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This paper presents a study of speech scramblers based on permu- 
tations of samples within an N-block. It has been found that a new 
family of “uniform” (U) permutations (defined by the address mapping 
1 — ki modulo N; k prime to N) ts as effective as pseudorandom (PR) 
permutations in destroying speech intelligibility. Analytical results 
show the relation between input and scrambled-signal spectra, while 
computer simulations compare the effects of scrambling on PAM sam- 
ples and on codes based on ADM, APCM, and ADPCM. Scrambling is in- 
creasingly effective in that order, and encoding delays in ADPCM can 
be as low as 1 to 2 ms. Finally, scrambling has been compared with 
frequency inversion, which corresponds to sign-inversion in every other 
Nyquist-rate waveform sample. 


|. INTRODUCTION AND SUMMARY OF RESULTS 


Waveform scrambling permits a conceptually simple method of speech 
encryption. In view of a certain ambiguity of definition in the subject, 
we define scrambling as a reversible temporal rearrangement of wave- 
form samples within a waveform block. Though specific scrambling 
methods have been well known and documented,!? no general study 
seems to have been made to relate a temporal permutation to the asso- 
ciated modification of the short-term frequency spectrum. Furthermore, 
for ascrambling block length N, the total number of permutations is N!; 
while certain pseudorandom (PR) permutations constitute an effective 
subclass of these N! permutations, other interesting subclasses exist, and 
none of these has been discussed in the literature to our knowledge. Fi- 
nally, while it is known that effective speech encryption can result from 
scrambling several types of speech codes (from waveform coders or from 
linear predictive vocoders?), alternative speech codes have not been 
compared from the point of view of encryption potential. Such a com- 
parison would be particularly useful if the comparisons included alter- 


781 


native methods for encoding speech at comparable bit rates—for ex- 
ample, speech waveform coding® using adaptive pulse code modulation 
(APCM), adaptive DPCM (ADPCM), and adaptive delta modulation 
(ADM). 

This paper discusses (1) the effect of a permutation on the input 
spectrum shape; (11) the generation of a new subclass of permutations 
characterized by a simple algorithm and by desirable spectrum modifi- 
cations; (iii) comparison of the new permutation subfamily with shift- 
register-generated PR permutations? and with the classical technique 
of frequency inversion;! and (iv) comparison of scrambler performance 
on ADM, APCM, and ADPCM bits and on pulse-amplitude modulated 
(PAM) speech samples for block lengths varying from 4 to 64. 

For ADM bits the sampling rate was 24 kHz. The APCM and ADPCM 
coders used 8-kHz samples and 3 bits per sample. Thus, the bit rate of 
all three coders was 24 kb/s. The PAM samples were sampled at 24 kHz 
for scrambling and at 8 kHz for frequency inversion; in the time domain, 
frequency inversion corresponds to reversing the polarity of every other 
Nyquist-rate PAM sample. The Nyquist frequency for the 200- to 
3200-Hz speech sample used in our perceptual experiment was 8 kHz. 
With block-scrambling of 24-kHz samples or bits, a block length of 64 
samples or bits implies a time duration of 64/24 = 2.7 ms, while a block 
length of four samples represents a duration of 0.17 ms. 

Our study is addressed to issues of casual privacy as well as formal 
encryption. For privacy, the objective is to render speech as unintelligible 
as possible by means of one out of many possible transformations, so that 
only a receiver with knowledge of an inverse transformation can un- 
derstand the message. There is very little change of the transformation 
function with time. Recorded messages in a privacy system are amenable 
to cryptanalysis after sufficient processing. This kind of privacy also 
obtains in transformations on text! (where, however, the secrecy re- 
quirement is potentially greater since there are no restrictions such as 
those in a real-time communication link). 

In formal encryption, secrecy is maintained by a repeated change of 
the transformation procedure (referred to as the key), and the number 
of keys available becomes a measure of the effectiveness of the system. 
As a general mathematical study of encryption is already available,* we 
shall devote more attention to the specific issue of speech privacy— 
specifically, the problem of destroying speech intelligibility using 
transformations such as permutations. A formal speech encrypter would 
employ the strategy of switching from one permutation transformation 
to another in a fashion known only to the intended receiver. The tech-. 
nique of frequency inversion is not useful for formal encryption because 
there is only one key associated with it. On the other hand, we can use 
the technique of adding a masking signal,! such as modulo 2 addition 


782 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


of a pseudorandom binary sequence,” for formal speech encryption in 
real time. We have compared scrambling with this technique and have 
found that for speech applications where some processing delay is ac- 
ceptable, temporal scrambling may be preferable. This is because the 
key information in scrambling is very small, and it takes very little in- 
formation to transmit changes of keys. 

The rest of this section summarizes the results of this paper. The novel 
subclass of permutations proposed in this paper is defined by the simple 
mapping of sample-position r into position s, with 


s=k, Xr (modulo N) 
r= 1, 2,---N, (la) 


where k is prime to N and N is the number of samples in the waveform 
block that is being scrambled. 

We refer to the one-to-one mapping above as uniform, or U, permu- 
tations (see Section II). The set of U permutations increases faster with 
block length N than the corresponding set of PR permutations; for N = 
8, it is larger by a factor of about 2, while for N = 128, this factor is 
nearly 7. 

The analytical results presented in the paper include an algorithmic 
relation between a permutation and the modification it produces in the 
DFT spectrum. Thus, if permutations of time samples are characterized 
by a matrix P (see Section IT) and if F is the standard DFT matrix [given 
by (5)], the transformation T of DFT samples is expressed by 


T=FPF-}, (2) 


For example, it has been shown that U permutations produce a uniform 
scrambling in the DFT spectrum as well. Descrambling is performed by 
another U permutation: 


r = kos (modulo N) 
if kiko = 1 (modulo N). (1b) 


Thus, for N = 32, if scrambling is done with k; = 7, descrambling will 
need ko = 23. 

Perceptual comparisons have been made of U permutations, PR per- 
mutations, and frequency inversions for PAM, ADM, APCM, and ADPCM 
samples. Results indicate the following increases in efficiency: 

(1) PAM < ADM < APCM < ADPCM for scrambling with a given block 
length N. 

(it) ADM < PAM < APCM < ADPCM for frequency inversions. 

The nature of the encrypted waveform is different between the first 
two and the last two codes, due conceivably to the fact that in PCM and 
DPCM, different bits have different weights associated with them. For 


SPEECH ENCRYPTION 783 


scrambling, U permutations performed at least as well as PR permuta- 
tions, and, for one case of N = 32, in fact did better in informal perceptual 
comparisons. For longer block lengths, no meaningful comparisons could 
be made between the two scramblers, because intelligibility was totally 
destroyed in each case (except with PAM samples). We have also found 
that if the scrambling delay is sought to be less than about 2 ms, it be- 
comes essential to use APCM or ADPCM codes. 

This paper is arranged as follows: Section II presents analytical results 
on permutation transformations and compares U and PR permutations. 
Section III briefly discusses the classical technique of frequency inver- 
sion. Section IV summarizes perceptual results and spectrograms from 
a computer simulation. Section V provides a heuristic comparison of 
masking and scrambling as alternative techniques for encryption. 


ll PERMUTATION TRANSFORMATIONS 
2.1 Permutation matrices 


The total number of permutations possible on a block of N samples 
is N!. If we also included the possibility of sign changes of the samples, 
this total is increased to 2 N!. As sign changes are easy to implement, 
a general study of such transpositions should be useful. In this paper, 
however, we restrict ourselves to simple permutations only, except for 
the special case of sign changes of alternate samples (Section III). This 
corresponds to classical frequency inversion, and is therefore of interest 
for comparison with permutations (scrambling). 

Temporal sample permutations can be characterized by a permutation 
operator P that is a matrix of ones and zeros. Thus, for a block of N = 
5 samples, if bit positions 0, 1, 2, 3, and 4 are permuted to 0, 1, 4, 2, and 
3, we can denote the permutation by 


[0,1,2>4>3— 9] 


or by the permutation matrix 


10000 
01000 
P=;00001 
0 0 fa 0 a 
00010 


Pp p2 3 
Expressing P as [2 > 4 > 3 9) gives the order of the cyclic group 
defined by P directly; since P? brings 2 back to 2, the order is 3. In other 
words, P® is an identity matrix J. 
If a permutation is defined in terms of more than one cycle, it can be 
seen that it has an order equal to the least common multiple (lcm) of 
(order of cycle 1, order of cycle 2, - - -). For example, the permutation [0 
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>3—>5-0,2—-4-—-6-—1-—2] has an order = lcm (3, 4) = 12, and this 
permutation is characterized as a P(3, 4) permutation. The number of 
repeated P(3, 4) permutation operations that will bring us back to an 
original unpermuted sequence is 12. (The permutation in the earlier 
example was of the P(1, 1, 3) type; in that case, three successive P(1, 1, 
3) operations would lead to the unpermuted original sequence.) In 
general, if 


P: P(p1, Pa, P3,+**)s 
where 
Pit potp3t---=N, (3a) 


the number of repeated P permutations that would result in the original 
unpermuted sequence [equivalently, the number of distinct mappings 
L(P) that can be generated using P] is 


L(P) = lem (p1, pa, p3°**)- (3b) 


2.2 Effect of permutations on frequency spectrum 


Let x(n), (n =0,1,---,N — 1) be the discrete input block of length 
N, which is to be permuted. Let X(m), (m =0,1,--+-,N — 1) beits DFT, 
and let x and X denote corresponding column vectors [x (0), x(1), ++ 
x(N — 1)]’ and [X(0), X(1),-+- X(N — 1)]’, respectively. Note that 


X= Fx, (4) 
where 
1 1 1 1 : ; a 
1 W w2 Ws, ; wr-l 
1 Ww? w4 we W2\N-1) 
F= (5) 
L WN) wew-p | pin= 


with the standard DFT identities 
Qa 
W= (- ] ) WN=]1 
exp (—] N 
and (6) 
N-1 
SS. Wr =0: 


r=0 


Let Xp, the DFT of the permuted sequence, be described as a transfor- 
mation of X: 
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Xp = TX = TFx. (7a) 


Also, in the manner of (4), 


Xp = FPx. (7b) 
From (7a) and (7b), 
FP = TF, or 
T = FPF} (8a) 
and 
P=F~'!TF, (8b) 
where the inverse of F can be seen to be 
1. 1 1 
1 W W-2. WWD) 
1 W-2 W-4 |. Wwr-aN-)) 
1 
aa (9) 
1 WeDo? 


Theorem: If P is represented in terms of a, b, c, d in the earlier char- 
acterization [0 ~ a, 1—>b,2—>c,3 —>d,-:>], 


Shae = \ [W-as + Wwr-bs + W2r-es 4 W3r-ds an eer l, (10) 


where, as usual, further simplifications can be made using the relations 
in (6). 
For a proof of this theorem, see the Appendix. 


Example 1: Let Pi: [0 ~ 2 > 1 3— 0]. That is, N = 4, W = -j, W? 
= —1, W? =), and W4 = 1; anda = 2,b =3,c = 1,andd = 0. 


~1/2 (1+))/2 j/2 0 
7 [G-v2 0 Ci-DA 0 
Lo : 7 

—j/2 (1-—j)/2 -1/2 0 

0 0 0 1 

0 0 -j 0 100 0 

0 -1 O 1 O 
7 = 0 : [4 = 0 0 
1|/j 0 0 0 0010 

0 O 0 1 0 0 0 1 
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It is seen that JT produces a transformation where DFT values are added 
together with different weights and phase shifts; while T? simply changes 
the positions of varying DFT values with varying phase shifts without 
modifying magnitudes. The fact that T{ = I could also be deduced by 
noting that in the time domain, P4 = I (because the order of P is 4). 


Example 2: Let 


1 0 OO 0 : : ; 0 O 
0 OO O O 0 1 
0 0 0 0 ‘ : ; 1 0 
Po = 3 : : ‘ : 2 ; : ‘ ‘i (11a) 
0 O 1 0 : ‘ : 0 O 
0 1 oO 90 ; ; ; 0 0 


Ps» corresponds to [0 > 0,1 —~ N-1,2—> WN — 2---]. Hence, from 
(10), 


N-Tys =1+ Wr-s\N-D 4 W2r-s(N—-2) 4... 
=1[14+Wwyrts4+ w2rts)4...= a - Wrts)N)/(1 — W'rts) 


using (6); since the numerator in the above sum is always zero, T;; is 
nonzero if and only if the denominator is zero, or ifs = MN —r;M = 0, 
1,2---;in this case, NT,, = N, or T;,; = 1. Consequently, 


Ts = Po. (11b) 


In words, local (intra-block) time-inversion (excluding the first sample) 
causes a corresponding DFT inversion (excluding the first sample). Notice 
that an unchanged first sample in the DFT vector represents an un- 
changed zero-frequency value of the input spectrum. (We emphasize 
here that DFT inversion does not represent analog frequency inversion, 
which will be discussed in Section III. The DFT spectrum is symmetrical 
about its middle point; inverting it is a perceptually trivial operation (see 
Section IV), and DFT inversion does not represent a useful means for 
speech encryption. 


2.3 Pseudorandom permutations 


A pseudorandom (PR) scrambling of samples within a block of length 
N is achieved by assigning to each sample a new address A(A = 1, or 2, 
or 3,---,or N) determined by the state of a maximal-length shift-register 
arrangement. The theory and design of maximal-length sequences is well 
documented.®*-®.” We shall therefore only provide a constructive recap- 
itulation in this paper. The approach is to start with a shift register whose 
length is D = logoN (assume that the block length N is a power of 2 and 
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Fig. 1—Pseudorandom scrambler with a five-stage shift register. 


that elements in the register are either 1 or 0). The next step is to select 
a primitive polynomial Q(y) of degree D and to include stage D — d in 
the register (d = 0,1, --- D — 1) inan EXCLUSIVE OR feedback (modulo 
2-add) arrangement if the coefficient of y¢ in Q(y) is nonzero. The re- 
sulting network now generates a succession of 2? — 1 = N — 1 nonzero 
states in the shift register at successive clock times, after which the cycle 
repeats, starting once again with the original initial state of the shift 
register. The number of nonzero states in the cycle is identically equal 
to the repetition period N — 1 of the cycle. Consequently, the N —1 states 
of the shift-register (specifically, their decimal equivalents) can be used 
as PR addresses for a block of N — 1 input samples in a one-to-one 
mapping of addresses. If the input block has N rather than N — 1 sam- 
ples (because of the frequent requirement that N be a power of 2), the 
address of the Nth sample is usually left unaltered by the scrambler. 
Such simplification is, however, not mandatory, and appropriate ma- 
nipulations that scramble all N samples are quite conceivable. 

Figure 1 illustrates the scrambler design for the example of D = 5 and 
N = 31, as defined by a primitive polynomial Q5(y) = y° + y2 + 1. We 
see how input samples (1, 2, 3, 4, 5, 6, 7, ---) get scrambled into PR po- 
sitions (1, 16, 8, 4, 18, 9, 20, --- ). We can verify easily that the mapping 
is one to one for the 31 sample values in an input block. The PR scram- 
bling of this example is illustrated more completely in the permutation 
matrix in Section 2.5 (Fig. 3a). 

It is clear that in the arrangement of Fig. 1, the use of a different ini- 
tializing sequence (other than 00001) can lead to a totally different 
mapping of sample addresses. There would be N — 1 nonzero initiali- 
zations corresponding to every given Q5(y). In addition, the number of 
primitive polynomials Q5(y) of degree 5 is 6, and this implies a further 
increase in the total number of possible mappings. 
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Table | — List of primitive polynomials 





Number of Primitive 








Typical Primitive Polynomials L(D) 
Degree D Polynomial Qp(y) of Degree D 
1 yt 1 
2 y2tyt+1 1 
3 ystyt1 2 
4 yity+1 2 
5 yo + y2 +1 6 
6 yo +y+1 6 
7 yitytl1 18 
8 yi tyt byes yy? 1 16 
9 yi tytt1 48 
10 yl + y3 +] 60 
11 yl + y241 176 
12 yl2+ y+ y4ty41 144 





Table I lists, for D = 1 to 12, a typical set of primitive polynomials, and 
also the number of primitive polynomials L(D) for each D. Note, for 
example, that a 12-stage shift register with an EXCLUSIVE OR feedback 
network involving stages 12, 11, 8, and 6 (D — d;d = 0, 1, 4, and 6) pro- 
vides one of 144 possible bases for a scrambler that would operate on an 
input block of 2!? = 4096 samples. 

The possibility of distinct scrambler mappings (number of keys) as 
defined by different initializations and/or different primitive polyno- 
mials is an important consideration from the point of view of the average 
descrambling time needed for an eavesdropping code-breaker. Let us 
note formally, then, that the number of keys K for a PR permutation of 
N sample blocks (N = 2?) is 


Kpr = (N — 1)L (loge). (12) 
The first term of the product in (12) represents the total number of 


shift-register initializations, and the second term gives the number of 
primitive polynomials (Table I) of degree D = loggN. Numerical values 
of Kpr are discussed in Section 2.5. 


2.4 Uniform permutations 


We now discuss the new subclass of scramblers defined by an out- 
put/input (s-r) mapping indicated in (1a): 


s = kir (modulo N) 
r=1,2,---N; k,isprimetoN. (13a) 
It can be verified that k,; should be prime to the block-length N for the 
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UNIFORM ONE—TO—ONE MAPPING 
r —» 7r (MOD 32) 


r=1,2,:-> 32 





Fig. 2—Uniform scrambler (r — 7r mod 32). 


mapping (13a) to be one to one. Figure 2 illustrates these permutations 
for the example of N = 32 and k; = 7. The complete permutation matrix 
is discussed in Section 2.5 (Fig. 3b), where it may be noted that the 1s 
in the matrix are spread uniformly in the 32 by 32 matrix—hence, the 
name uniform permutations, or U permutations. 

The descrambling of a U permutation is achieved by means of another 
(inverse) U permutation so that k;k2 = 1 (modulo N): 


r = kos (modulo N) 
s=1,2,--N (13b) 


and kik» = 1(modulo N). The condition that k,k» = 1 follows from the 
requirement that kos = ko(kir) = r, for all r. Thus, for N = 82, if 
scrambling is done with k; = 7, descrambling will require kz = 23 in order 
that kjk, = 161 mod 32 = 1. 


2.4.1 Choice of k, 


The value of k, in (18a) is an interesting issue. We do not offer any 
rigorous criterion for optimizing k1, and we believe in general that there 
is no single optimum for practical applications where, in fact, we use 
different k; values as different keys in encryption. However, there are 
three observations regarding k,; that are worth noting. 

First, a reasonable scrambling procedure is one that distributes N 1s 
uniformly in an N by N permutation. For simplicity, let N be a perfect 
square. The uniform distribution problem is then one of dividing the N 
by N matrix into a number N of VN by VN submatrices, and to place 
one and axactly one of the N 1s in the center of each submatrix. Adjacent 
1s would then be separated by a distance equal to the side of the square 
submatrix. Indeed, this distance VN would correspond to the uniform 
mapping parameter k,. On the other hand, N in general need not be a 
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perfect square, and VN need not be an integer. Furthermore, k; should 
be prime to N for one-to-one mapping, and if VN should indeed be an 
integer, it will not be prime to N. All that can be said in conclusion, then, 
is that in the absence of a better criterion, a value of k; that is close to 
VN ,and at the same time is prime to N, would represent a reasonable 
design value. 

A second viewpoint on the value of k; relates to adjacent sample cor- 
relations. The U permutation causes input samples separated by a dis- 
tance k, to be brought together. Thus, if input samples separated by the 
distance k; are uncorrelated, then the U permutation will convert the 
original N sequence to one in which adjacent samples tend to be un- 
correlated. In a subsequent section, we discuss cross-correlations between 
input and scrambled sequences (and corresponding cross-spectra). 

The third observation is that there is an interesting geometrical in- 
terpretation of k,. It is related to the slopes of the zig-zag straight line 
loci obtained by joining successive 1s in the permutation matrix while 
scanning it from top to bottom (increasing r). 

Finally, notice that the special value of k; = N — 1 corresponds to DFT 
inversion (local time inversion, as discussed earlier). 


2.4.2 Number of keys in U permutations 


It can be seen that the number of keys (distinct mappings K for U 
permutation is a product of the form 


Ky =NGWN), (14) 


where G(JN) is used to denote the number of k, values that are prime to 
the block length N, and N is the number of cyclic shifts (translations by 
one sample) of an input block prior to permutation via a given permu- 
tation matrix (which is equivalently expressed as the number of ways 
of selecting the first row in vertical cyclic shifts of the permutation ma- 
trix). It can be verified that the following values of G apply for the special 
cases where N is prime or a power of two: 


G(N) =N — 2if N is prime 
G(N)=N-D-lif N= 2. (15) 


Numerical values of Ky are discussed in Section 2.5. 


2.4.3 Effect of U permutations on frequency spectrum 


The effect of U permutations on input spectrum follows immediate- 
ly from an earlier relation (10). For U permutations, then, the DFT 
transposition matrix T is characterized by 


NT +s =l+ Wr-ski so Weer 2sn1 5 aaa (16a) 
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Nout ——> 
































Fig. 3(a)— Pseudorandom permutation matrix. 


Summing the geometric series above, 


1- Wr-ski 
1 [from (6)], the numerator above is always zero. A 


Because WN 


nonzero 7’, therefore requires that the denominator is zero as well: 


or 


Wr-ski = 1, 


=0, or 
= r/ky. 


r—sk, 


(16b) 


Ss 
It can be verified from (16a) that this condition will make T,, = 1. Fur- 


thermore, if (16b) should hold for all r, one requires that s/r should be 


independent of r. That is 


? 


(16c) 


? 


s=kgr 


so that from (16b), 


kar = r/k, or 


(16d) 


-domain causes a 


1 (modulo N). 
uniform permutation in the frequency domain. Thus, if the U permu- 


kyk3 — 


uniform permutation in the time 


? 


In other words 
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Fig. 3(b)—Uniform permutation matrix. 


tations use k, = 7, k3 = 23 from (16d); and the mapping (16c) (which 
denotes nonzero 7;,, values) indicates the following frequency transpo- 
sitions: [0 — 0, 1 > 23, 2 — 14,3 — 5, --- or in general t — 23t (modulo 
32) for t = 0, 1,--+ 31]. Notice, once again, the special example of DFT 
inversion and the corresponding local time inversion; these are expressed 
by the mappings ks = N — 1 and k; = N — 1, respectively. 


2.5 Comparisons of PR and U permutations 


2.5.1 Permutation matrices 


Figures 3a and b are typical permutation matrices for PR and U per- 
mutations. As in Fig. 1, the PR matrix of Fig. 3a is based on a fifth-order 
primitive polynomial Q5(y) = y° + y2 + 1, a beginning shift-register state 
of 00001 and a mapping of position 32 into itself. Following Fig. 2, the 
U matrix of Fig. 3b is based on a value of k; = 7. 


2.5.2 Number of keys in encryption 


Table II lists illustrative values of the number of keys Kpr and Ky, 
as obtained from (12) and (14), respectively. Note that with both PR and 
U mappings, not all of the total number of permutations can be equally 
effective in destroying speech intelligibility. A good example of a per- 
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Table Il — Number of keys in PR and U permutations 








N (number 
of samples 
in block) Kpr = (N — 1)L(logeN) Ky = NG(N) 
8 14 32 
16 30 176 
32 186 832 
64 378 3648 
128 2286 15360 
256 4080 63232 


ceptually uninteresting permutation is DFT inversion* (see Section IV). 
In any case, on the basis of Table II, Ny increases much faster with N 
than Npr. For this reason, U permutations are potentially more attrac- 
tive candidates, a priori, for formal encryption.t 


2.5.3 Effects on frequency spectrum 


Assessments of speech encryption techniques should be really based 
on perceptual testing; Section IV describes such testing and also provides 
interesting speech spectrograms. Meanwhile, it is instructive to compare 
PR and U permutations on the basis of their effects on illustrative input 
spectra. 

Figure 4 provides these comparisons. Figures 4 through 6 use linear 
low-pass, high-pass, and mid-pass models for input spectra, while the 
input spectrum of Fig. 7 is a simplified three-pole model that could be 
an example of the short-term spectrum of a voiced speech sample. Note 
that both PR and U permutations are effective in distorting input spectra; 
in fact, the tendency in each case is to whiten the spectrum. However, 
the whitening in PR permutations is both global and local, while U per- 
mutations produce whitening only in a global (average) sense. In other 
words, PR-scrambled spectra are smoother in terms of adjacent sample 
transitions in the output spectrum. This is due to the fact that entries 
in the DFT-transposition matrix T, eqs. (7a) and (8a), have varying 
weights in PR permutation. With U permutations, on the other hand, the 
only nonzero entries in T' are ones; the effect is simply one of rearranging 
input DFT samples without any magnitude weighting. 


2.6 Cross-correlations between input and scrambled samples 


Spectral distortions that provide optimal speech encryption are dif- 
ficult to specify; and further, they are likely to be input-spectrum de- 


* It is interesting that DFT inversion is ineffective despite the fact that the associated 
ea (time inversion) is characterized by a large input-output “distance” >“, 
r—s(r)|. 


+ Another advantage of U permutations is that N need not be a power of 2. 
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0 4 8 12 16 20 24 28 
(a) ORIGINAL SPECTRUM 


0 4 8 12 16 20 24 28 
(b) U PERMUTATION SPECTRUM (k = 7) 


0 4 8 12 16 20 24 28 
(c)PR PERMUTATION SPECTRUM 


Fig. 4—Effects of PR and U permutations on low-pass spectrum. 


pendent. It appears reasonable, however, that the whitening effects in 
Figs. 4 through 7 are very desirable for encryption. A different criterion 
for encryption is the decorrelation of input and scrambled spectra; 
spectral whitening is in general a sufficient, but not necessary, condition 
for such decorrelation. 

Let us define a correlation measure C by x’x, which represents the dot 
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0 4 8 12 16 20 24 28 
(a) ORIGINAL SPECTRUM 


0 4 8 12 16 20 24 28 
(b) U PERMUTATION SPECTRUM 


0 4 8 12 16 20 24 28 
(c) PR PERMUTATION SPECTRUM 


Fig. 5—Effects of PR and U permutation on high-pass spectrum. 


product of an input sequence with itself. Correlation between scrambled 
and original sequences is x px, where the subscript P indicates permu- 
tation or scrambling. A corresponding correlation between input and 
scrambled spectra is X pX*, where X* is the complete conjugate of the 
input spectrum X. Notice that 


X pX* = (FPx)/(Fx)* = (Px) F’F*x. 


It can be shown from (5) that F’F* = IN, where J is an identity matrix. 
Consequently, spectral and time-domain cross-correlations between 
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0) 4 8 12 16 20 24 28 
(a) ORIGINAL SPECTRUM 


0 4 8 12 16 20 24 28 
(b) U PERMUTATION SPECTRUM 


0 4 8 12 16 20 24 28 
(c) PR PERMUTATION SPECTRUM 


Fig. 6—Effects of PR and U permutations on mid-pass spectrum. 
input and scrambled samples are related by 
X pX* = N(Px)’x. (17) 


In other words, spectral decorrelation requires that the permuted and 
the original sequences are themselves orthogonal or uncorrelated. The 
significance of negative cross-correlations is not always clear. In fact, 
in at least one instance, a negative cross-correlation of —1 between input 
and scrambled spectra is known to be perceptually suboptimal. This is 
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0 4 8 12 16 20 24 28 
(a) ORIGINAL SPECTRUM 


0 4 8 12 16 20 24 28 
(b) U PERMUTATION SPECTRUM 


0 4 8 12 16 20 24 28 
(c) PR PERMUTATION SPECTRUM 


Fig. 7—Effects of PR and U permutations on an example of a voiced speech spectrum. 


the example of frequency inversion (Section III); however, it has been 
reported that listeners can be trained to understand frequency-inverted 
speech!. It would appear then that scramblers that cause the cross- 
correlations in (17) to approach zero are, in general, very desirable. The 
case of zero correlation is realized for U permutation, for example, if input 
message samples that are separated by a distance of Rk; are uncorrelat- 
ed. 


2.7 Permutations of binary sequences 


In practical speech-encryption techniques, we scramble speech- 
carrying bits in digital codes such as PCM, DPCM, and DM? rather than 
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Oo. 61 2 3 4 #«5 0 1 2 3 4 5 


ORIGINAL DFT SPECTRUM FREQUENCY INVERTED 
DFT SPECTRUM 


(a) (b) 


Fig. 8—Frequency inversion. 


PAM speech samples. Although the analyses of preceding sections can 
be applied in principle to the special case of binary samples, the utility 
of such analyses may be limited because the issue of interest is the effect 
of scrambling on the decoded speech spectrum rather than the spectrum 
of the bits itself. Furthermore, there are many other factors, such as the 
varying significance of PCM (or DPCM) bits depending on their position 
in a PCM (or DPCM) word. In view of the above problems, we have de- 
ferred our observations on speech code scrambling to the section on 
perceptual experiments (Section IV), rather than attempt analytical 
predictions. We briefly analyze, on the other hand, the interesting case 
of permutations with sign changes, before proceeding to summarize re- 
sults of perceptual experiments. 


Ill. TRANSFORMATIONS WITH SIGN CHANGES—FREQUENCY 
INVERSION 

The total number of permutations of a sequence of N samples is N!. 
If sign changes are allowed in addition to position changes, the total 
number of possible transformations increases to 2‘N!. The analytical 
approaches of Section II can perhaps be extended to study such trans- 
formations. The purpose of this section, however, is only to consider a 
very specific transformation. This transformation involves no explicit 
permutation, but it introduces sign changes in every other input sample. 
This is equivalent to the classical encryption technique of analog fre- 
quency inversion.! 

We begin by recapitulating that DFT inversion, as given by (1la) and 
(11b), does not constitute analog frequency inversion, because the DFT 
of a real signal is symmetric in the sense | X(r)| = | X(N —r)| forr =1, 
2,--+N, and the highest analog frequency corresponds to [N/2] where 
[ ] indicates “largest integer in.” Analog frequency inversion would re- 
sult, on the other hand, if the DFT coefficients are inverted about N/4 
(Fig. 8), assuming that N is even. (Such inversion is achieved if all the 
DFT coefficients are translated cyclically through a distance of N/2. For 
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(a) (b) 


0 Z 0 Z 22 
FREQUENCY ——> FREQUENCY ——> 


Fig. 9—Frequency inversion by modulation and low-pass filtering. (a) Original band- 
limited spectrum. (b) Result of modulating spectrum with carrier at Z. The left half of 
(b) is the inverse of (a). 


odd N, there is a modification of these coefficients as well, since a simple 
translation will render the spectrum asymmetric.) 
For a cyclic shift through N/2, the N by N frequency transformation 


matrix (for even N) is 
0 I 
fie J 
E a) (18a) 


where the submatrices are each of order N/2. From (8b), the corre- 
sponding input transformation is 


1 0 O 0 
0 -1 O 0 

P¥REQ INVERSION = | 0 0 1 0 (18b) 
0 0 O -1 


Thus, analog frequency inversion is obtained if alternate input samples 
are multiplied by —1. This is also apparent from the fact that frequency 
inversion occurs when a band-limited signal is modulated by a carrier 
at the highest input frequency, and out-of-band frequencies are filtered 
out from the modulation product (see Fig. 9). In the digital technique, 
the sequence of +1 and —1 elements in T corresponds to the carrier, and 
its frequency is simply the frequency of +1 (or —1) entries. 


IV. EXPERIMENTAL RESULTS 


In this section, we summarize experimental observations on speech 
waveform encryption using PAM samples as well as APCM, ADPCM, and 
ADM codes.* The letter A in code notation stands for instantaneously 
adaptive quantization,® and our DPCM and DM codes used simple first- 
order predictors. Our results are from computer simulations, and the 
studies have included PR and U permutations, analog frequency inver- 
sion, and (the academic case of) DFT inversion. Our conclusions are based 
on informal perceptual tests and on spectrograms of original and en- 
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crypted speech. The speech sample used in the experiments was a 2- 
second band-limited (200 to 3200 Hz) utterance, “The chairman cast 
three votes.” 
4.1 Perceptual observations 

4.1.1 Scrambling with PR and U permutations 


The sampling frequency for PAM and ADM bits was 24 kHz. The APCM 
and ADPCM bits used Nyquist-sampled (8 kHz) speech with 3 bits of 
quantization per sample. Thus, the scramblers operated on 24-kHz se- 
quences in all four cases, and the encoding delay due to scrambling was 
the same for all the four waveform formats as long as the scrambler block 
length N was the same. Values of N ranged from 4 to 64. The speech- 
encoding qualities resulting in the identical bit-rate (24 kb/s) ADM, 
APCM, and ADPCM codes were, of course, noticeably different, but this 
issue is not strictly relevant to the present discussion. The following were 
the main observations on encryption efficiency. 

(t) In comparing pseudorandom and uniform permutations, we did 
not notice important perceptual differences: the U-permutation speech 
scramblers were about as good as the PR scramblers for given N, and, 
in fact, they were slightly better in some cases than PR scramblers. 

(it) In comparing speech code formats from the point of view of the 
benefits of scrambling, there was a definite ordering of efficiency: 


PAM < ADM < APCM < ADPCM. (19) 


This means that for a given encoding delay (or scrambling block length 

N), the least desirable candidate for scrambling is a PAM sequence, while 

the most desirable format was an ADPCM code. The following specific 

observations are worth noting: 
(a) Scrambling of PAM samples is marginally effective with N = 

64. 

Scrambling of ADM samples is quite effective with N = 64. In fact, 

for very casual encryption using ADM bit-scrambling, even N = 

32 can be useful; with N = 32, the specific U permutations that 

were used (k; = 7) were slightly more effective than the PR per- 

mutations. 

(c) Scrambling of APCM and ADPCM bits destroys speech intelligi- 
bility very effectively with N = 32, while for casual encryption 
even N = 16 can be useful. Thus, ADPCM at 24 kb/s, with a 
scrambling delay of 16/24 = 0.67 ms, can be a very attractive 
candidate for practical speech-privacy systems (for example, 
optional facilities for privacy in mobile telephony). 


(b 


— 


4.1.2 Frequency inversion 


The speech formats used here were the same as for the scrambler 
studies, except in the case of PAM samples. These were sampled at the 
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Nyquist frequency of 8 kHz (instead of 24 kHz) to provide frequency 
inversion (Section III and Fig. 9), which is a classical speech-encryption 
procedure.! As mentioned in Section III, the inversion of every other 
Nyquist-rate PAM sample provides a simple digital technique for speech 
frequency inversion. The sign-reversal operations on ADM, APCM, and 
ADPCM codes are less interesting in that they do not provide speech 
spectrum inversion, but only the inversion of respective bit spectra. In 
any case, sign reversals of adjacent samples provided effective speech 
distortions for all the speech-formats studies, with the following ordering 
of encryption efficiency in informal tests: 


ADM < PAM < APCM < ADPCM. (20) 


The important point, however, is that unlike in scrambling, encryption 
potential in frequency inversion cannot be increased by such simple 
means as increasing a block length N (N = 2 is the minimum as well as 
maximum useful block length for frequency inversion). More impor- 
tantly, owing to the fact that only one key is associated with this tech- 
nique, frequency inversion is not suitable for formal encryption with 
time-varying keys. Furthermore, even for casual privacy, the afore- 
mentioned utility of frequency inversion (with PAM samples) is to be 
qualified with the observation that listeners can be trained to follow 
frequency-inverted speech. Frequency inversions with ADM bits were 
rather ineffective, for reasons not completely understood by the authors; 
while the more effective APCM and ADPCM inversions (20) were no more 
efficient than APCM- and ADPCM-based scramblers. (See, for example, 
spectrograms in Figs. 10a and b.) 

Finally, perceptual tests confirmed our earlier observation that DFT 
inversions are academic and useless for speech encryption. 


4.2 Speech spectrograms 


Figure 10 reinforces and supplements the perceptual tests summarized 
in the previous section. 

Figure 10a compares the classical technique of frequency inversion 
with a scheme that inverts the signs of alternate ADPCM bits. Notice the 
lack of speech-like patterns in the latter example and the contributions 
of ADPCM quantization noise that also serve to reduce speech intelligi- 
bility. In contrast, the frequency inversion of PAM samples leads to a 
spectrogram that is informationally equivalent to the original speech 
spectrogram, being only a mirror image of it across the 4-kHz line. 

Figure 10b shows the benefits of increasing the block length N in 
temporal scrambling for the example of ADPCM codes and PR permu- 
tations. 

Figure 10c demonstrates the effect of U permutations in scrambling, 
and compares PAM, ADM, and ADPCM samples as candidates for 
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Fig. 10(a)—Spectograms of original and scrambled speech: original speech, frequency inversion, sign changing of alter~ 
nate ADPCM bits. 
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Fig. 10(b)—Spectograms of PR scrambling of ADPCM bits: N 


$08 NOILdAYONS HOsd3adS 


FREQUENCY IN KILOHERTZ 


“THE CHAIRMAN CAST THREE VOTES” 


2— 3 24 kHz PAM 
4 SAMPLES 








24 kb/s 
ADM BITS 
A~ eae heat 
3- 
24 kb/s 







; iif ADPCM BITS 













‘ia A a 


TIME (APPROXIMATE TOTAL DURATION IS TWO SECONDS) 


Fig. 10(c)—Spectograms of U scrambling of speech codes (N = 16): PAM, ADM, ADPCM. 


UNIFORM—PERMUTATION SCRAMBLING OF SPEECH SAMPLES 


scrambling. The spectrogram confirms the ordering of efficiency noted 
in (19): PAM < ADM < ADPCM. 


V. ENCRYPTION, SCRAMBLING, AND MASKING 


The preceding sections have looked mainly at effective privacy 
employing temporal permutations. The effectiveness of scrambling for 
encryption would depend on the rate at which the key is changed, the 
distortion produced by each key, and the number of keys available. As- 
suming that the distortion introduced by a typical key is adequate, a 
measure of goodness of any subfamily of permutations would be the 
number of members of that set. On this count, U permutations are better 
than PR permutations (Ky > Kpr for a given N, Table II). We do not 
have a sufficient understanding of how many of these Ky or Kpr per- 
mutations are desirable from the point of view of distorting speech.* But 
we believe that for both PR and U permutations, the numbers of trivial 
or obviously useless permutations are small fractions of the number of 
possibilities in Table II. It can also be shown that the degree of com- 
plexity in encryption is equal to loggK. This number results from sta- 
tistical procedures that minimize the number of receivers needed for 
cryptanalysis.4 

Many so-called speech scramblers encrypt speech not by temporal 
permutations (as in our definition of scrambling), but by the addition 
of appropriate masking signals. A typical example is the EXCLUSIVE OR 
mod-2 addition of a pseudorandom binary sequence, bit by bit, to an 
ADM or PCM bit stream. 

If the masking signal is such that the spectrum of the sum signal is 
white, we obtain a perfect cipher in some sense. However, the entropy 
of the key must equal the entropy of the signal to be encrypted, and a 
perfect cipher is an idealized case in this sense. Realistically, we seek 
masking signals that change slowly with the message waveform, and 
typical resultant spectra are non-speech-like although not perfectly 
white. 

The use of modulo-m additions (example modulo-2 adds) in masking 
ensures that the amplitude range of the encrypted signal is the same as 
that of the input signal; preservation of properties such as dynamic range 
serves as a precaution against techniques of cryptanalysis. 

Temporal scrambling has the following limitations with respect to 
masking: (7) scrambling introduces encoding delays (= N) and (it) it 
leaves long (>WN) tell-tale silences unaffected in speech encryption. 
Masking, on the other hand, is characterized by a greater complexity of 


* This depends not only on the properties of the permutations, but also on those of the 
samples being permuted. Highly correlated PAM samples, for example, demand more 
drastic permutations than less redundant ADM bits for a given N. 
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the key signal. If delays of a few milliseconds are not objectionable and 
if the exposure of the on-off (speech-silence) patterns is not a problem, 
the simplicity of scrambling should make it more attractive. 

A final issue is the comparison of temporal scrambling and masking 
purely from the point of view of reducing the intelligibility of speech 
sounds. In at least one experiment,® performances have been found to 
be very comparable. This experiment used 24-kb/s ADPCM speech bits. 
For scrambling, a block length of N = 16 was employed, and for masking, 
a 16-bit PR number (binary sequence) was added (modulo 2), bit-by-bit, 
to each of 16 contiguous ADPCM bits in the speech code to be masked. 

For practical implementations, it is conceivable that sophisticated 
schemes may run permutation and masking operations in tandem. Also, 
if the associated keys are time-varying, it may be practical to use certain 
serial configurations for the shift register, as described in Ref. 9, in place 
of a bank of conventional hard-wired registers. 


APPENDIX 
Proof of Theorem (10) 


Recall from (8a) that T = FPF-1 = [FP]|F—!. The value of T,, is the 
dot-product of the sth column of [F~1] and the rth row of [FP]. 

The sth column in F—! has a typical element W-*s/N where R is the 
row number [see (9)]. 

To evaluate the dot product that gives T,,, we need V,r, the Rth 
column element in the rth row of [FP]. The quantity Vp in turn is the 
product of the rth row in F and the Rth column in P. The only nonzero 
elements in P are the 1s that occur in positions a, b, c, d, +--+ in rows 0, 
1, 2, 3, --- as characterized by P[0O > a,1—> b,2—>c,3—d,---]. Fur- 
thermore, the rth row in F has a typical element W"”, where n is the 
column number [see (5)]. Consequently, V-p has the form W"”"®), where 
n(R) is the column element in the rth row of F, which gets multiplied 
by the single 1 entry in the Rth column of P. (‘The other elements in the 
rth row of F get multiplied by 0 entries in the Rth column of P, and do 
not contribute to V;p.) 

Consequently, the dot product that specifies T,, has the form 


1 no hey 1 Not rhein 
N R=0 ™N kz=o 


Because of a one-to-one mapping between R and n(R), the above sum- 
mation can be rewritten as a summation over n(R): 


1 N-1 
SS W2th)-0)-Res) 


Ts == 
Nncky=0 


In characterizing P, n(R) values of 0, 1, 2, 3, -- - correspond to R values 
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of a, b,c, d,---. Hence, 


TT. = (W-2s + Wr-bs 4 W2r-cs 4 War-ds 4... ). 


= 
N 
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Techniques for Coding Dithered Two-Level 
Pictures 


By A. N. NETRAVALI, F. W. MOUNTS, and J.-D. BEYER 
(Manuscript received October 14, 1976) 


This paper considers several methods for the efficient coding of 
two-level pictures dithered to give the appearance of multiple ampli- 
tude levels. In the dithering technique, a multilevel image signal is 
compared with a position-dependent set of thresholds (called a dither 
matrix), and, if the image value exceeds the threshold, the two-level 
output signal is taken to be “white,” otherwise it is taken to be “black.” 
Spatial correlation present in the original image is not preserved in the 
two-level picture due to spatial variation of the value of the threshold; 
and, therefore, standard techniques for coding two-level pictures, such 
as run-length coding, lose their efficiency. We show how some of our 
recently developed techniques for coding two-level pictures can be 
modified to code two-level dithered images. Our computer simulations 
on a few representative two-level dithered pictures indicate that an 
entropy between 0.2 to 0.3 bit/pel is possible using our technique. A 
comparison with some recently proposed techniques by Judice indi- 
cates that those schemes result in about 10 to 60 percent higher entropy 
than our schemes. 


Il. INTRODUCTION 


Techniques for representing the entire gray scale of a picture by only 
two levels have been receiving considerable attention!~!° because many 
devices are limited to recording or displaying two-level signals. Although 
these techniques may differ in specific algorithms, they provide the 
subjective illusion of a wide range of gray shades by controlling the 
proportion of picture elements in a neighborhood that are in the “on” 
state. 

One of the techniques (called “dither” and studied in detail by Limb,? 
Lippel and Kurland,’ and Judice et al.9-!°) consists of comparing the 
multilevel input image signal with a position-dependent set of thresholds 
and setting only those picture elements to “white” (or 1) where the image 
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Fig. 1—A 4 by 4 dither matrix to be used for images with 8-bit quantized samples. 


input signal exceeds the threshold. A square matrix of threshold values 
(elements of a “dither matrix’’) is repeated as a regular array to provide 
a threshold pattern for the entire image. Subjective effects of gray shades 
are achieved by using a dither matrix. The 4 by 4 dither matrix used by 
Judice et al.9 for an input image having gray levels between 0 and 255 
is shown in Fig. 1. The values of the adjacent elements of the dither 
matrix were chosen to take advantage of the spatial low-pass filtering 
present in the human visual system and, at the same time, to reproduce 
edges accurately and to avoid objectionable patterns. When the input 
image intensity is compared with spatially varying thresholds, a large 
amount of spatial correlation present in the input image is suppressed. 
This loss of correlation in the dithered two-level image makes some of 
the standard methods of coding two-level signals, such as run-length 
coding,!! inefficient. 

Several modifications of the standard techniques are possible. Judice 
has considered two such modifications. In one scheme,!? called bit in- 
terleaving, runs of picture elements corresponding to equal or near equal 
elements of the dither matrix are coded using standard techniques. The 
other scheme,!* called pattern matching, assigns a code to two-dimen- 
sional bit patterns of the dithered image and relies for bit-rate reduction 
on the fact that all possible bit patterns do not occur with the same fre- 
quency. 

In our schemes, the “state” of the coder when an element is to be coded 
is a function of the already transmitted values of surrounding elements 
and the dither threshold at the element to be coded. The value that 
minimizes the probability of prediction error conditioned on a state is 
the predicted value of the element to be coded. This is an extension of 
the predictive coding for two-level pictures discussed in Ref. 14. In one 
of our schemes, we code run-lengths of the prediction error. In our other 
schemes, we change the relative order of the picture elements along a 
scan line in such a way as to increase the average run-length of the black 
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and/or white elements and then transmit the run-lengths. These schemes 
are described in detail in Section 2.3. 

We have investigated the efficiency of our techniques by computer 
simulation on a few representative 4- by 5-inch pictures, scanned with 
an array of 512 by 512. Using run-length coding of prediction errors, it 
is possible to decrease the bit rate to about 0.22 to 0.42 bit/pel. ““Good- 
bad” state ordering performs best among all the ordering schemes that 
we have considered, and it brings the bit rates down to between 0.20 to 
0.30 bit/pel. As a comparison, results from simulation of the same pic- 
tures by Judice show a bit rate of between 0.22 to 0.48 bit/pel. 


Il. CODING ALGORITHMS 


In this section, we describe our coding algorithms in detail and present 
results of our computer simulations. The pictures used for computer 
simulation are shown in Fig. 2. These pictures were 4 inches by 5 inches 
and were scanned with 512 samples along a line and 512 lines. Each 
picture element was digitized by a uniform PCM coder to an accuracy 
of eight bits (256 levels). The digitized signal was dithered by using the 
dither matrix shown in Fig. 1. The coding algorithms were applied to the 
dithered images. As a measure of performance, we used the sample 
first-order entropy of run-length statistics. We computed the average 
black and white run-lengths and the entropy of black and white runs 
using, for example, the formula 

Ew = = loge 
where E,, is the entropy of white run-lengths, n; is the number of white 
runs of length i, and N is the total number of white runs. Using these, 


and eq. (1), we computed the entropy in bits/pel by: 
EWwNw + EoN 
RS biYb (1) 
rwNw + ryNo 
geu tes (when N,, = No»), 
rae re 


where 
E, is the entropy of the black run statistics (bits/run) 
ry is the average white run-length (pels/run) 
rp is the average black run-length (pels/run) 
Np, Nz are the number of white and black runs, respectively 
E is the entropy in bits/pel. 


2.1 Prediction algorithm 


Consider a dithered picture element S;; at position (i,j) where the | 
dither matrix has value D;;. To develop a predictor for S;;, consider 
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Fig. 2—Dithered images used for computer simulation of coding algorithms. (a) Karen. 
(b) Engineering drawing. (c) House. 


surrounding elements W, X, Y, Z as shown in Fig. 3. Note that elements 
W, X, Y, Z and the value of the dither matrix D;; are known to the re- 
ceiver when decoding the signal at point (i,j). The state associated with 
Sj; is defined as the five-tuple 


Q = (Dij,W,X,Y,Z). (2) 


Since we are using a 4 by 4 dither matrix, D;; can have 16 different values. 
Each W, X, Y, Z can have two values, and, therefore, the number of 
possible states is 256. Let these be denoted by the set {Q;}, k = 1, «=, 256. 
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Z $j; | PICTURE ELEMENT BEING CODED 
Dj: VALUE OF DITHER MATRIX ELEMENT 


Fig. 3—Configuration for definition of state. 


Then a predictor code book is given by: 


C(Q,) = “0” if P(S;; = “0”/Q = Q,) = 0.5 
= “1” otherwise, (3) 


where C(-) is the predictor code book and P(-|-) is the experimentally 
determined conditional probability that a picture element has a value 
of “0” or “1” given the state Q;,. Thus, the predictor thus depends upon 
the previously transmitted values in a neighborhood and some partial 
information about the present picture element known both to the 
transmitter and the receiver in the form of the value of the dither matrix. 
The code book can be designed for each picture and transmitted before 
actual picture transmission,* or it can be taken to be an average for a 
class of pictures. A portion of the code book for each of the three pictures 
that we considered is shown in Table I, where we assign the state number, 
k, by computing formula (4), and show the number of occurrences of each 
state and the probability of error. 


k=Dj+8W+4X+2V+Z4+1. (4) 


We note that the probability of prediction being in error is always less 
than 0.5 due to our method of prediction. The code book defined by eq. 
(3) does vary from picture to picture. We examine the effects of such 
variation in a later section. 


2.2 Run-length coding of prediction errors 


In this technique, we code the run-lengths of the prediction errors 
along a scan line using the appropriate code book for each picture. The 
entropy of the run-length statistics is given in Table II. It varies between 
0.22 to 0.42 bit/pel. To evaluate the effects of the variation of code book 
with respect to pictures, we used the code book of one picture for the 
prediction of another. The resulting entropies of the run-lengths of 


* Only 256 bits are needed to transmit a code book; since there are (512)? picture ele- 
ments per picture, transmission of one code book per picture corresponds to an additional 
0.001 bit/pel. 
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Table I—State dependent predictors for different pictures. Also shown is the probability of prediction error arid the number of 
elements in state. Only the first 54 states are shown for brevity. 


State Definition Picture: Engineering Drawing Picture: House 





Dither Total Probability of Total Probability of Total Probability of 
State Matrix Elements Pre- Prediction Elements Pre- Prediction Elements Pre- Prediction 

Number = (Dj;) Ww X YY Z in State — diction Error in State — diction Error inState — diction Error 
1 0 0 0 0 0 4026 1 0.018 3197 1 0.041 6225 1 0.000 
2 0 0 0 0 1 12 1 0.000 197 1 0.000 0 0 0.000 
3 0 0 0 1 0 77 1 0.000 760 1 0.011 32 1 0.000 
4 0 0 0 1 1 2 1 0.000 97 1 0.000 0 0 0.000 
5 0 0 1 rt) 0 0 0 0.000 0 0 0.000 0 0 0.000 
6 0 0 1 0 1 0 0 0.000 0 0 0.000 0 0 0.000 
7 0 0 1 1 0 0 0 0.000 0 0 0.000 0 0 0.000 
8 0 0 1 1 1 0 0 0.000 0 0 0.000 0 0 0.000 
9 0 1 0 0 0 2590 1 0.000 1489 ] 0.003 5665 1 0.000 
10 0 1 0 0 1 43 1 0.000 429 1 0.000 0 0 0.000 
11 0 1 0 1 0 6121 1 0.000 2342 1 0.002 4204 1 0.000 
12 0 1 0 1 1 3258 1 0.000 7618 1 0.000 3 1 0.000 
13 0 1 1 0 0 0 0 0.000 0 0 0.000 0 0 0.000 
14 0 1 1 0 1 0 0 0.000 0 0 0.000 0 0) 0.000 
14 0 ] 1 1 0 0 0 0.000 0 0 0.000 0 0 0.000 
16 0 1 1 1 1 0 0 0.000 0 0 0.000 0 0 0.000 
7 16 0 0 0 0 2938 1 0.279 3259 1 0.417 4217 1 0.047 
18 16 0 0 0 1 4 1 0.000 108 1 0.000 0 0 0.000 
19 16 0 0 1 0 61 1 0.000 785 1 0.135 10 1 0.000 
20 16 0 0 1 1 0 0 0.000 32 1 0.000 0 0 0.000 
21 16 0 1 0 0 0 0 0.000 2 0 0.000 0 0 0.000 
22 16 0 1 0 1 0 0 0.000 0 0 0.000 Q 0 0.000 
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0.113 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 


0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.487 
0.000 


prediction errors are also shown in Table II. As expected, when a code 
book is used that is not specifically matched to the picture, there is a loss 
of coding efficiency. However, by using the code book for the picture of 
Fig. 2a for all the pictures, there is a maximum loss of only 0.08 bit/pel. 
Thus, although better results in terms of entropy can be obtained by 
using a matched code book, it appears possible to use a general code book 
that will not degrade the performance significantly. 


2.3 Ordering techniques 


These ordering techniques are extensions of our techniques for two- 
level pictures.» In these techniques, we order either the elements or the 
prediction errors of the present line using a reference signal available 
to both the transmitter and the receiver—for example, the elements of 
the previous line. 

To illustrate the technique, consider a memory containing 512 cells 
(equal to the number of elements per line). Suppose the cells of this 
memory are numbered from 1 to 512. If the first element of the previous 
line is white, then we put the prediction error for the first element of the 
present line in memory cell 1; if the first element of the previous line is 
black, then we put the prediction error for the first element of the present 
line in memory cell 512. We continue in this manner: the prediction error 
of the ith element of the present line is put in the unfilled memory cell 
of smallest index or of largest index depending on whether the 7th ele- 
ment of the previous line is white or black. When the memory is filled, 
its cells are read in numerical order and the contents are run-length 
encoded. It is easy to see that the present line can be uniquely recon- 
structed from the knowledge of the run-lengths of the ordered line, since 
the ordering information is known to the receiver. 

The efficiency of the simple ordering technique discussed above is 
given in Table II. It is seen that due to the process of dithering, much 
of the efficiency of ordering is lost. We overcome the effects of dithering 
by using an ordering technique based on the “goodness” of the state. 
We divide the states defined in eq. (2) into two groups. States that have 
a high probability of correct prediction are called “good” states and the 
remaining are called “bad” states. Our algorithms can be described as 
follows: we first evaluate the prediction error for a particular element 
of the present line, and then, if the state is “good,” we put the prediction 
error in the unfilled memory cell with the smallest index; if the state is 
“bad,” the prediction error is put in the unfilled memory cell with the 
largest index. Having ordered the prediction errors, we run-length code 
them as before. 

It is easy to see that the line of picture data can be uniquely recon- 
structed from the coded run-lengths of the prediction error. The en- 
tropies obtained by this scheme are given in Table II. The criterion of 
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Table II—Entropy comparisons for different coding algorithms 
a rn Oe ee A 
Entropy (Bit/Pel) 
; Picture II 
Picture I “Engineer- 


: ing Picture II 
Algorithms “Karen” Drawing” “House” 
i a EE 


Coding Algorithms 


I. Run-length coding of prediction errors using code book of picture I 0.33 0.47 0.30 
using code book of picture II 0.38 0.42 0.44 
using code book of picture III 0.45 0.65 0.22 
Ordering Algorithms 
II. | Run-length coding of ordered elements ‘X’ with respect to elements ‘Y’ 

(a) X: Samples of present line; Y: prediction of present sample 0.48 0.63 0.62 
(b) X: Prediction errors of present line; Y: samples of previous line 0.37 0.44 0.29 
(c) X: Prediction errors of present line; Y: good-bad states Goodness threshold : 0.02 0.29 0.39 0.22 
0.05 0.26 0.30 0.20 
0.1 0.29 0.30 0.22 
0.2 0.31 0.40 0.22 

(d) X: Prediction errors of present line; Y: good-bad states; code book of 
picture I : : Goodness threshold‘ 0.05 0.26 0.40 0.23 

Algorithms of Judice 

III, One-dimensional bit interleaving 0.37 0.50 0.27 
IV. Two-dimensional bit interleaving 0.33 0.48 0.22 
V. Pattern matching 0.40 0.61 0.28 





“goodness” of a state is determined by a threshold on the probability 
of error. Our simulations indicate that a threshold of 5-percent proba- 
bility of error does better than thresholds of 2 percent, 10 percent, and 
20 percent for all three pictures. The entropy is reduced to between 0.20 
to 0.30 bit/pel. The advantage of ordering, obtained by comparing these 
entropies with those obtained from run-length coding of the prediction 
errors, is about 9 to 29 percent. We also considered the use of a prediction 
code book from Fig. 2 for all pictures. This resulted in a small increase 
in entropy over that obtained by using a matched code book. 


2.4 Comparisons with the algorithms of Judice 


We mentioned earlier that Judice has recently given two algorithms 
for coding of dithered two-level pictures. In one of them,!” runs of picture 
elements corresponding to the same values of the dither matrix are 
run-length coded. He has discussed this scheme in one dimension (called 
one-dimensional bit interleaving) as well as two dimensions (called 
two-dimensional bit interleaving). The bit rates obtained by these two 
schemes are reproduced from Ref. 12 in Table II. A bit rate of 0.22 to 0.48 
bit/pel is possible with these schemes. This is about 10- to 60-percent 
higher than the entropies obtainable from our “good-bad” state ordering 
schemes. The other scheme discussed by Judice et al.,!° called pattern 
matching, assigns a code to two-dimensional bit patterns of the dithered 
images. The entropies obtained in this case, also reproduced in Table 
II from Ref. 13, are generally higher than those achieved by two-di- 
mensional bit interleaving. Thus, our “good-bad” state ordering schemes 
perform more efficiently than the schemes proposed by Judice. 


Ill. DISCUSSION AND SUMMARY 


We have described schemes for efficient coding of dithered two-level 
signals. We started with the description of a predictor that depends upon 
already transmitted neighboring elements and the value of the dither 
matrix at the element being predicted. The predictor minimizes the 
probability of prediction being in error. We found that the run-length 
coding of the prediction errors brought the bit rate down to about 0.22 
to 0.42 bit/pel for the three pictures we used for simulation. We then 
discussed several ordering algorithms in which the relative order of 
transmission of the picture elements in a scan line is changed to increase 
the average lengths of black and white runs. We found that the ordering 
scheme based on goodness of the state decreased the bit rate to 0.20 to 
0.30 bit/pel. Finally, we compared our results with those obtained by 
Judice and found that his schemes gave about 10 to 60 percent higher 
entropy. 
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It should be mentioned that this is not a definitive coding system 
study. We have not considered many important factors crucial to the 
success of any coding system, such as run-length codes and their picture 
dependence, and the effect of transmission errors. 
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A Probability Inequality and Its Application to 
Switching Networks 
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The use of channel graphs to study the blocking probabilities of 
multistage switching networks was first proposed by Lee and has gained 
popularity ever since. A channel graph between an input terminal and 
an output terminal is the union of all paths connecting them in the 
network. Usually, the assumption is made that links connecting the 
same two stages, say the ith stage and the (i + 1)st stage, have constant 
and identical probability p; of being busy. Let G(s,) denote the class 
of channel graphs with s stages and ) paths. We show that for every 
channel graph in G(s,d) with multiple links, there exists a channel 
graph in the same class without multiple links that has smaller or equal 
blocking probabilities for all {p;}. We obtain this result by first proving 
a probability inequality of a more general nature. 


I. INTRODUCTION 


In this paper we consider a switching network as a directed graph. A 
vertex is called a switch if its in-degree and out-degree are both positive, 
an input terminal if it has in-degree zero and out-degree one, and an 
output terminal if it has in-degree one and out-degree zero. The edges 
between the switches are called links. A switch is said to be of size n X 
m if it has in-degree n and out-degree m. Every switch in our network 
is assumed to be two-sided nonblocking in the sense that when the 
network is in actual use, traffic can be routed from every input link to 
every output link in a switch, provided the two links involved are not 
carrying other traffic, and regardless of the traffic carried by other 
links. 

In a multistage switching network, the switches are partitioned into 
a sequence of stages with the following properties. 


(t) The sizes of switches in a given stage are identical. 
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(it) All input terminals are connected to the switches of the first stage; 
all output terminals are connected to the switches of the last stage. 

(111) Links exist only between two switches in adjacent stages. [We 
call links between the ith stage and the (i + 1)st stage the ith-stage links. | 
The direction of an ith-stage link is from the ith stage to the (i + 1)st 
stage. 


A channel graph between a given input terminal and a given output 
terminal is the smallest subgraph containing all paths connecting the 
two terminals. Since a link in a path is also shared by other paths con- 
necting possibly other pairs of terminals, the actual routing of a path will 
fail if any link involved has already been used to route some other path. 
In that case, we say that the path is blocked. The blocking probability 
of a channel graph is the probability that every path in it is blocked. 

Lee® first suggested the use of channel graphs to study the blocking 
performances of switching networks. Usually, the assumption that each 
ith-stage link has the constant and independent probability p; of being 
busy (meaning the link is used in routing some other path) is made to 
simplify the computations of blocking probabilities. Lee’s method has 
gained popularity both in theory and in practice since its proposal. 

A class of multistage switching networks that has been widely used 
but only recently has come under systematic study is the class of bal- 
anced networks‘. Balanced networks are characterized by the property 
that the channel graphs for all pairs of input terminals and output ter- 
minals are isomorphic. Thus, the blocking performance of a balanced 
network can be studied by analyzing just one channel graph. 

Let G(s,\) denote the class of channel graphs with s stages and J paths. 
Comparisons of channel graphs in a given G(s, \) have been made in Refs. 
1 and 3. This paper is a continuation of this study. We are particularly 
interested in channel graphs with multiple links. Networks with multiple 
links between a pair of switches have recently been studied by Fontenot.? 
In this paper, we show that for every such channel graph, there is a 
channel graph in the same class but without multiple links with an equal 
or smaller blocking probability for any arbitrarily given set {p;}. In some 
cases, a switching network constructed using such a channel graph has 
a larger number of crosspoints than the corresponding multiple-link 
network. In other cases, however, our construction produces a network 
with the same number of crosspoints (and therefore cost), but lower 
blocking probability. This is illustrated by a simple example at the end 
of our paper. 


ll. A PROBABILITY INEQUALITY 


We prove a probability inequality which is itself of some interest and 
has application to our study of channel graphs. 
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Theorem 1: 


? 


k , k Tk. yi 
IGSpn)e1= ie U (1 ~ pi)} 
i= = 
where p; and c; are real numbers satisfying 1 = p; 2 Oand c; 2 1 fori 
=1,...,R. 
Proof: Proof is by induction on k. Theorem 1 is trivially true for k = 1. 
For general k, assume Theorem 1 is true for all k’ = 1,---,k — 1. 

Let b = 1} c;, y = WE} (1 — pj), pp = p andc, =c. Thenb =1and 

12 y =20. By induction, 


k . 
isp st=poa=tep tay)" (1) 
1=1 
It is sufficient to prove that 


(=p) =(L =p) =—9)? 21 = (l= (1 = p)y}s 
or equivalently, 


(eS (Pip) yy)" = pra = pl a)e (2) 
Let z = p°. Then, 1 = z = 0. Ineq. (2) can be written as 
ee )ytS oer (2s), (3) 
We first show that 
fay) =fl-G-—2z™*)y}e s 1-1 —z)y = g(z,y). (4) 


Clearly f(1,y) = g(1,y). Furthermore, 


) 
© fe,y) = ofl — (1 2¥)yet yt 2tlent 
Oz c 


zy © g(2,y), 
oz 
since 
{1 — (1 — zl/c)yfe1zWe-1 > {1 — (1 — zWejje-1zt/e-1] = 1, 


Therefore, Ineq. (4) is true. Consequently, to prove Ineq. (3), it suffices 
to prove 


h(z,y) = g(z,y)® = {1 — (1 — z)y}® 
<z+(1-—z)(l—y)® =ulz,y). (5) 


We have h(z,0) = u(z,0). Furthermore 
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since 
1=1-z20 
and 
LS 2)y = bay 

Therefore, Ineq. (5) is true. The proof is completed. 

It is easy to construct counter-examples of Theorem 1 if the conditions 
c; 2 1 fori =1,...,k are violated. 
lll. A THEOREM ON CHANNEL GRAPHS 

Consider a channel graph which contains the subgraph of Fig. 1, 


Q e ,) 


Fig. 1—Graph with multiple links. 


where A is an ith-stage switch, B an (i + 1)st-stage switch, C an (i + 
2)nd-stage switch and Max{m,n} > 1. Since B is a two-sided nonblocking 
switch, there are nm paths from A to C. Let p;, i = 1, --- ,s be the prob- 
ability that an ith-stage link is busy. We show that if we replace Fig. 1 
with the subgraph of Fig. 2, 





Fig. 2—Graph without multiple links. 


then the new channel graph, clearly in the same class G(s,A), will have 
an equal or smaller blocking probability. It suffices to show that the 
blocking probability of the graph in Fig. 1 is equal or larger than that of 
the graph in Fig. 2. Routing from A to C can be realized in Fig. 1 if at least 
one link from each of the n and m links is available. The probability of 
this event is 


(1 — p?)(1 — pity). 
The same routing can be realized in Fig. 2 if at least one of the nm two- 
link paths (see Fig. 3) 
A B Cc 


O_o —_—_—___—__—_O 
Fig. 3—Two-link path. 
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is nonblocking. The probability of this event is 
1 (leap) pra) 


That the first probability is equal or smaller than the second proba- 
bility is an immediate consequence of the method of Theorem 1 by set- 
ting k = 2. Therefore we have proved the following theorem. 


Theorem 2: For every channel graph with multiple links, there is a 
channel graph in the same G(s,d) class without multiple links that has 
equal or smaller blocking probability. 


Example. Let us compare the blocking probabilities of the two five- 
stage balanced networks in Fig. 4 and Fig. 5. 





Fig. 4—Network with multiple links. Fig. aaa without ars links. 


The two channel graphs are shown in Fig. 6 and Fig. 7, respectively. 


Fig. 6—Channel graph of network with multiple links. 


Fig. 7—Channel graph of network without multiple links. 


By Theorem 1, the blocking probability of the channel graph in Fig. 6 
is equal to or greater than that of the channel graph in Fig. 7. Therefore, 
we conclude that the network in Fig. 5 has smaller blocking probabilities 
than the one in Fig. 4. Note that the two networks are identical except 
for the way the switches are linked. 
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pounds. 


ELECTRONIC AND ELECTRICAL ENGINEERING 


Frequency-Agile Millimeter-Wave Phase Lock System. P. S. Henry, Rev. Sci. Instrum. 
47 (September 1976), pp. 1020-1025. A frequency-agile phase lock system for milli- 
meter-wave klystrons is described. The system locks the klystron to a crystal-controlled 
reference signal derived from a frequency synthesizer. By programming the synthesizer, 
the klystron can be stepped through any sequence of frequencies lying within a band 
roughly 200 MHz wide. 


Optical-Fiber Impulse-Response Measurement System. J. W. Dannwolf, S. Gottfried, 
G. A. Sargent, R. C. Strum, IEEE Trans. Instrum. Meas.,/M-25, V. 4 (December 1976), 
pp. 401-406. This paper describes time-domain instrumentation designed to measure 
impulse response and delay of multimode optical fibers used in an experimental optical 
communications system at Bell Laboratories. Time-domain data is transformed to fre- 
quency-domain by a minicomputer, and the result is displayed as the fiber’s baseband 
frequency response. 


Tantalum Thin Film RC Circuit Technology for a Universal Active Filter. W. 
Worobey and J. Rutkiewicz, IEEE Trans. Parts, Hybrids and Packag., PHP-12, No. 
4 (December 1976), pp. 276-282. This paper describes the physical layout, process 
sequence, and component properties of an RC universal active filter. The high-precision 
filter is fabricated on a 16-pin dual in-line package ceramic substrate using tantalum thin 
film technology. It is comprised of 300 0/0 resistors, 190 V anodized tantalum capacitors, 
and an operational amplifier. 


The Use of Echo Time-Weighting to Derive Oscilloscope Graticules for Rating 
Television Transmission Performance. R. W. Edmonds, SMPTE J., 85, No. 6 (June 
1976), pp. 393-396. This paper describes a method for designing oscilloscope graticules 
for measuring the short-time waveform performance of broadcast television systems. The 
design is based on recently obtained single-echo time-weighting functions for monochrome 
and color signals. 


Using Triangularly Weighted Interpolation to Get 13-Bit PCM from a Sigma-Delta 
Modulator. J. C. Candy, Y. C. Ching, and D. S. Alexander, IEEE Trans. Commun., 24, 
No. 11 (November 1976), pp»1268-1275. | Accumulating a weighted sum of sigma-delta 
codes generates a high-resolution PCM signal. Several weighting methods are evaluated 
with regard to resolution and spectral response; a triangular weighing is near optimum. 
Implementation of a 13-bit PCM encoder is described and a method for overcoming a 
threshold phenomenon is presented. 


MATERIALS SCIENCE 


Electrical, Structural and Optical Properties of Amorphous Carbon. J. J. Hauser, 
J. Noncrystal. Solids, 23 (January 1977), pp. 21-41. The planar and transverse elec- 
trical resistivity of amorphous carbon (a-C) films is well fitted by the expression p = po 
exp (T/T)/4. Films thinner than 600 A display a two-dimensional hopping conductivity 
from which one deduces a density of states N(Er) at the Fermi level of 1018 eV—!1cm-3 and 
a radius of the localized wave functions (a) of 12 A. 


Epitaxial Structures with Alternate-Atomic-Layer Composition Modulation. A. 
C. Gossard, P. M. Petroff, W. Weigmann, R. Dingle, and A. Savage, Appl. Phys. Lett.,29, 
No. 6 (15 September 1976), pp. 323-325. Epitaxial structures grown by alternate 
monolayer depositions of GaAs and AlAs are reported. As many as 104 alternate (100) layers 
of GaAs and AlAs as thin as 1.0 + 0.1 and 1.0 + 0.1 monolayers, respectively, were deposited 
and studied by transmission electron microscopy and optical techniques. 


The Optical Properties of a Soda-Lime-Silica Glass in the Region From 0.006 to 22 
eV.B. G. Bagley, E. M. Vogel, W. G. French and G. A. Pasteur, J. N. Gan, and J. Tauc,* 
J. Non-Crystalline Sol, 22 (November/December 1976), pp. 423-436. From the mea- 
sured absorption and reflection spectra, we have determined the optical properties of a 
well-characterized (with respect to impurities and homogeneity) high-purity 21.3 wt% 
NapO—5.2 wt% CaO—73.5 wt% SiOz glass over the energy range 0.006—22 eV. The origins 
of the absorption spectra are discussed. | *Brown University. 


832 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1977 


PHYSICS 


Dynamic Central Peaks in a Crystalline Solid: KTaQ3. K. B. Lyons and P. A. Fleury, 
Phys. Rev. Lett., 37, (July 19, 1976), pp. 161-164. We report two central peaks in the 
quasielastic-light-scattering spectrum of KTaQ3. The polarization and angular dependence 
of the linewidth indicate that the narrow component (2.3 + 0.3 GHz at 300 K in right-angle 
scattering) is due to entropy fluctuations. A tentative identification of the broader com- 
ponent with two-phonon processes is made. 
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